Note: The two data files “bgg_db_2017_04.csv” and “bgg_clean_dat.csv” can be found on the github repo.

Section 1. Overview and Motivation

1.1 Background

We all love Settlers of Catan, but what is it about Catan that makes it so addictive? There are many different components of a board game that give it a lot of variations. Different board games can have various maximum or minimum number of players, varying length of play time, different themes, mechanisms, designers or even different difficulty! We are curious to find out which of these attributes actually make a board game a good one, measured by player ratings. With data collected from boardgamegeek.com, we have player ratings on thousands of board games and their features. We are interested to do some exploratory analysis on the features and potentially build models to predict which board games are more likely to be loved by players.

1.2 Objectives

We identified the following objectives for our project:
1. Investigate the possible traits of high-rating board games using ggplot2 package in R.
2. Build models to predict the success of a new board game and find important features of the successful games (defined by high average player rating).
3. Recommend board games to players based on certain specified criteria or other games that they love.


Section 2. Initial Questions

2.1 Initial Questions

The first question we asked was:
What are the strongest predictors of a board game’s success (in terms of average player rating)?

We began to find in our EDA that some general variables, such as rank or number of votes, although very predictive of rating, are not useful since they are not features of board games that board game manufacturers or players can control.

2.2 New Questions

Due to the above mentioned problem, our question then evolved to “what are some useful predictors of a board game’s success (in terms of rating)?” We found that there were several categories in game categories and game mechanics that were predictive of rating, and zoned in our models to develop a predictive model based on these categorical variables and some general variables, such as weight (game complexity) and year.

Other questions curious about include:
1. different stratifications in our dataset, such as games by age groups and single/multiplayer involvement.
2. How game preferences and characteristics changed over time (board game evolution). 3. Whether different machine learning methods would help us predict ratings better, and compared their RMSEs. 4. If we could use the variables we’ve gathered in our dataset to build a good board game recommender using Euclidean distances between board game features.


Section 3. Data

3.1 Data Source

Our board game data comes from a data set on Kaggle.com. The link to the data set is:
[link] (https://www.kaggle.com/mrpantherson/board-game-data/data)

3.2 Data Cleaning & Processing

Our data cleaning work involves 3 parts:
1. Replace Wrong Values
2. Recode Numerical Values to Categorical Values
3. Hot-code Mechanics and Category Columns

3.2.1 Replacing Wrong Values

We found several variables that need to be checked and replaced with appropriate values, including * min_players * max_players * weight * avg_time * min_time * max_time * year

Find all cells with value = 0 and cells with values not making sense. Search the original website to find information and overwrite them. Also recoding some continuos variables to categorical.

game <- read.csv("bgg_data.csv")

### Overwriting non-sense cells
# min_players
game$min_players[1616] <- 1; game$min_players[1962] <- 1; game$min_players[2273] <- 1 
game$min_players[2408] <- 1; game$min_players[2693] <- 1; game$min_players[3332] <- 1 
game$min_players[3850] <- 1; game$min_players[3902] <- 1; game$min_players[4145] <- 1 
game$min_players[4247] <- 1; game$min_players[4311] <- 1; game$min_players[4442] <- 1 
game$min_players[4480] <- 1; game$min_players[4815] <- 1 

# max_players
game$max_players[1616] <- NA; game$max_players[1962] <- NA; game$max_players[2273] <- NA
game$max_players[2408] <- NA; game$max_players[2687] <- 2; game$max_players[2739] <- 4
game$max_players[2818] <- 2; game$max_players[3332] <- 1; game$max_players[3356] <- 4
game$max_players[3570] <- 2; game$max_players[3724] <- 2; game$max_players[3875] <- 2
game$max_players[3902] <- NA; game$max_players[4016] <- 2; game$max_players[4145] <- NA
game$max_players[4241] <- 2; game$max_players[4354] <- 2; game$max_players[4437] <- 2
game$max_players[4442] <- NA; game$max_players[4480] <- NA; game$max_players[4504] <- 2
game$max_players[4528] <- 2; game$max_players[4540] <- 2; game$max_players[4795] <- 2
game$max_players[4815] <- NA; game$max_players[4988] <- 2

# weight
game$weight[1477] <- NA; game$weight[4381] <- NA; game$weight[4521] <- NA

# min_time, avg_time and max_time are cleaned in excel and re-imported back
game <- read.csv("bgg_clean_dat.csv", sep = " ", header = T)

3.2.2 Recoding

We tried to recode the following * min_players - to categories single player, multi-player or party game * max_players - to categories single player, multi-player or party game * min_time - to categories 0 (short), 1(medium), 2(long) * avg_time - to categories 0 (short), 1(medium), 2(long) * weight - to categories 0(easy), 1(medium), 2(hard)

# recode players
game$single_player = 0
game$single_player[game$min_players == 1] = 1
game$multi_player = 0
game$multi_player[game$min_players > 1 & game$max_players <= 4] = 1
game$party_player = 0
game$party_player[game$max_players > 4] = 1
# recode min_time
# 0 = short,  1 = medium, 2 = long
quantile(game$min_time, na.rm = T) 
##    0%   25%   50%   75%  100% 
##     1    30    45    90 17280
game$cate_mintime = 0
game$cate_mintime[game$min_time >= 30 & game$min_time <= 90] = 1
game$cate_mintime[game$min_time > 90] = 2
# recode avg_time
# 0 = short,  1 = medium, 2 = long
quantile(game$avg_time, na.rm = T)
##    0%   25%   50%   75%  100% 
##     1    30    60   120 22500
game$cate_avgtime = 0
game$cate_avgtime[game$avg_time >= 30 & game$avg_time <= 120] = 1
game$cate_avgtime[game$avg_time > 120] = 2
# recode weight
# 0 = easy, 1 = medium, 2 = hard
quantile(game$weight, na.rm = T)
##      0%     25%     50%     75%    100% 
## 1.00000 1.73885 2.28915 2.88890 4.90480
game$cate_weight = 0
game$cate_weight[game$weight >= 1.73885 & game$weight <= 2.8889] = 1
game$cate_weight[game$weight > 2.8889] = 2

# write final cleaned csv and import csv
write.table(game, "bgg_final_clean_dat.csv", sep = "|")
game <- read.csv("bgg_final_clean_dat.csv", sep = "|", header = T)

3.2.3 Hot-Coding

For each board game, it can have multiple mechanics or categories. We split the mechanics and categories into separate columns with each boradgame having 0 or 1 for each of the mechanic and category.

# recode mechanic
# find unique mechanics
mech_str <- paste(as.character(game$mechanic), collapse = ", ")
mech_unique <- unique(strsplit(mech_str, ", ")[[1]])
mech_unique_lower <- unlist(lapply(mech_unique, function(x) {paste(strsplit(tolower(x), " ")[[1]], collapse = "_")}))

# create one empty column for each unique mechanic
mechanic_col <- data.frame(matrix(0, ncol = length(mech_unique_lower), nrow = dim(game)[1]))
colnames(mechanic_col) <- mech_unique_lower

# fill in the values of the mechanic columns
fill_mech_col <- function(df, mechanic_col) {
  for (i in 1:dim(df)[1]) {
    mech_col_num <- which(mech_unique %in% c(strsplit(as.character(df$mechanic[i]), ", ")[[1]]))
    for (j in mech_col_num) {
      mechanic_col[i, j] <- 1
    }
  }
  return(mechanic_col)
}
mechanic_col <- fill_mech_col(game, mechanic_col)

# recode categories
# find unique categories
cat_str <- paste(as.character(game$category), collapse = ", ")
cat_unique <- unique(strsplit(cat_str, ", ")[[1]])
cat_unique_lower <- unlist(lapply(cat_unique, function(x) {paste(strsplit(tolower(x), " ")[[1]], collapse = "_")}))

# create one empty column for each unique category
cat_col <- data.frame(matrix(0, ncol = length(cat_unique_lower), nrow = dim(game)[1]))
colnames(cat_col) <- cat_unique_lower

# fill in the values of the category columns
fill_cat_col <- function(df, cat_col) {
  for (i in 1:dim(df)[1]) {
    cat_col_num <- which(cat_unique %in% c(strsplit(as.character(df$category[i]), ", ")[[1]]))
    for (j in cat_col_num) {
      cat_col[i, j] <- 1
    }
  }
  return(cat_col)
}

cat_col <- fill_cat_col(game, cat_col)

df_new <- cbind(game, mechanic_col)
write.table(df_new, 'df_w_mechanic', sep = "|")
df_new2 <- cbind(game, cat_col)
write.table(df_new2, 'df_w_cat', sep = "|")

df_w_mechanic <- read.csv('df_w_mechanic', sep = "|")
df_w_cat <- read.csv('df_w_cat', sep = "|")
drops <- c('none')
df_mech_new <- df_w_mechanic[ , !(names(df_w_mechanic) %in% drops)]
df_mech_new$memory_mechanic <- df_mech_new$memory
df_mech_final <- df_mech_new[ , !(names(df_mech_new) %in% 'memory')]
df_cat_new <- df_w_cat[ , !(names(df_w_cat) %in% drops)]
df_cat_final <- df_cat_new[, 27:109]
df_recode_final <- cbind(df_mech_final, df_cat_final)
write.table(df_recode_final, 'df_recode_final_1127', sep = "|")

Section 4. Exploratory Data Analysis

game <- read.csv("df_recode_final_1127", sep = "|")

4.1 Summary Plots

4.1.1 Mean Geek Rating and Mean Average Rating by Category/Theme of Games

Figure 1.1: Avg rating vs geek rating across game ranks by categories

#loading game without separation of mech and cate into indicator variables 
game1 <- read.csv("bgg_final_clean_dat.csv", sep = "|", header = T)

# Functions
split_into_multiple <- function(column, pattern = ", ", into_prefix){
  cols <- str_split_fixed(column, pattern, n = Inf)
  cols[which(cols == "")] <- NA
  cols <- as_tibble(cols)
  m <- dim(cols)[2]
  names(cols) <- paste(into_prefix, 1:m, sep = "_")
  return(cols)
}

#Splitting 
game1 <- game1 %>% bind_cols(split_into_multiple(game1$category,',','category')) %>%
  bind_cols(split_into_multiple(game1$mechanic,',','mechanic'))

#Cleaning
game1 <- game1 %>% select(-category, -mechanic, -designer, -image_url)

#Tidying
tidygame <- game1 %>% gather(key, categories, category_1:category_11, na.rm = TRUE) %>% select(-key) %>%
  gather(key, mechanics, mechanic_1:mechanic_18, na.rm = TRUE) %>% select(-key)

tidygame$mechanics <- trimws(tidygame$mechanics)
tidygame$categories <- trimws(tidygame$categories)

#Categories vs ratings
tidygame %>% 
 group_by(categories) %>% summarize(avgrating = mean(avg_rating), avggeek = mean(geek_rating), avgrank = mean(rank)) %>%  
  ggplot() +
  geom_point(aes(reorder(categories, avgrank), avggeek, color = 'avg_geek'), size = 0.5) +
  geom_point(aes(categories, avgrating, color = 'avg_rating'), size = 0.5) + 
  scale_colour_manual(name="Rating", values=c(avg_geek="red", avg_rating="blue")) +
  theme(axis.text=element_text(size=8, angle = 60, hjust =1)) +
  ylab("Rating") +
  xlab("Rank") +
  ggtitle("Ratings by Categories") 

Figure 1.2: Avg rating vs geek rating across game ranks by mechanics

#Mechanics vs ratings
tidygame %>% group_by(mechanics) %>% summarize(avgrating = mean(avg_rating), avggeek = mean(geek_rating), avgrank = mean(rank)) %>%
  ggplot() +
  geom_point(aes(reorder(mechanics, avgrank), avggeek, color = 'avg_geek'), size = 0.5) +
  geom_point(aes(mechanics, avgrating, color = 'avg_rating'), size = 0.5) +
  scale_colour_manual(name="Rating", values=c(avg_geek="red", avg_rating="blue")) + #adds legend
  theme(axis.text=element_text(size=9, angle = 60, hjust = 1)) +
  ylab("Rating") +
  xlab("Rank") +
  ggtitle("Ratings by Mechanics") 

The plots above show that the geek ratings are lower than average ratings across all categories or mechanics in general.

4.1.3 Plots and ranking among 3 Age Groups

Generate agecat according to IQR

Then, we reported the top 10 ranked games for each age group and listed out their average ratings and geek ratings. The age groups were divided by interquartile ranges of 0-8, 9-12, and 13-21. We can clearly see that board games favored by each age group are different, and that games favored by average population and geeks are different in all 3 age groups.

Q1 <- quantile(game$age, 0.25) 
Q3 <- quantile(game$age, 0.75) 
game <- game %>% mutate(agecat = ifelse (age %in% range(0,Q1), 1, ifelse(age %in% range(Q1, Q3), 2, 3)))

Get the top 10 rated (based on avg_rating) for each age group, game_id

# agecat == 1 when age %in% range(0,Q1)
game %>% 
  select(avg_rating, geek_rating, rank, age, agecat, game_id, names) %>%
  filter(agecat == 1) %>% 
  arrange(desc(avg_rating)) %>% 
  head(10) %>% kable
avg_rating geek_rating rank age agecat game_id names
8.74356 5.98863 2021 0 1 68820 Enemy Action: Ardennes
8.59714 5.68432 4007 0 1 39939 The Battle of Fontenoy: 11 May, 1745
8.50652 5.66641 4257 0 1 223619 Shadow War: Armageddon
8.50000 5.63690 4786 0 1 193238 Tunisia II
8.49597 5.67612 4118 0 1 185380 Exceed: Red Horizon ? Satoshi & Mei Lien vs. Baelkhor & Morathi
8.46923 5.84305 2693 0 1 99358 Stonewall Jackson’s Way II
8.46140 5.78569 3069 0 1 149620 Advanced Squad Leader: Starter Kit Historical Module 1 ? Decision at Elst
8.44687 5.71536 3639 0 1 183578 Wing Leader: Supremacy 1943-1945
8.42061 5.79244 3018 8 1 108018 Riichi Mahjong
8.41032 5.67778 4096 0 1 176596 The Great Battles of Alexander: Macedonian Art of War
# agecat == 2 when age %in% range(Q1,Q3)
game %>% 
  select(avg_rating, geek_rating, rank, age, agecat, game_id, names) %>%
  filter(agecat == 2) %>% 
  arrange(desc(avg_rating)) %>% 
  head(10) %>% kable
avg_rating geek_rating rank age agecat game_id names
9.08970 8.15151 5 12 2 174430 Gloomhaven
8.91346 5.66261 4334 12 2 220308 Gaia Project
8.85597 6.48439 868 12 2 192135 Too Many Bones
8.77167 5.75141 3319 12 2 173504 The Greatest Day: Sword, Juno, and Gold Beaches
8.52372 5.71214 3672 12 2 199904 Pericles: The Peloponnesian Wars
8.45000 5.63642 4795 12 2 174298 Napoleon’s Last Gamble
8.43974 5.70077 3796 12 2 163399 Infinity: Operation Icestorm
8.41381 5.66988 4203 12 2 193867 1822: The Railways of Great Britain
8.40513 6.94830 373 12 2 200680 Agricola (revised edition)
8.40132 5.95793 2142 12 2 32989 Axis Empires: Totaler Krieg!
# agecat == 3 when age %in% range(Q3,)
game %>% 
  select(avg_rating, geek_rating, rank, age, agecat, game_id, names) %>%
  filter(agecat == 3) %>% 
  arrange(desc(avg_rating)) %>% 
  head(10) %>% kable
avg_rating geek_rating rank age agecat game_id names
9.33167 5.79078 3026 14 3 186751 Mythic Battles: Pantheon
9.14646 5.64691 4591 14 3 198985 Day Night Z
8.89899 7.28089 150 17 3 55690 Kingdom Death: Monster
8.85900 5.76596 3194 15 3 144574 Last Chance for Victory
8.82781 5.65702 4425 14 3 168537 Pandemonium
8.82278 5.70251 3771 13 3 178896 Last Blitzkrieg
8.72977 8.30744 2 14 3 182028 Through the Ages: A New Story of Civilization
8.71368 5.98688 2025 16 3 63170 1817
8.66905 8.48904 1 13 3 161936 Pandemic Legacy: Season 1
8.60654 5.78242 3091 16 3 85424 La Bataille de la Moscowa (third edition)

Get the top 10 rated (based on avg_geek rating) for each age group, game_id.

# agecat == 1 when age %in% range(0,Q1)
game %>% 
  select(avg_rating, geek_rating, rank, age, agecat, game_id, names) %>%
  filter(agecat == 1) %>% 
  arrange(desc(geek_rating)) %>% 
  head(10) %>% kable
avg_rating geek_rating rank age agecat game_id names
7.82454 7.69242 44 8 1 163412 Patchwork
8.05321 7.58639 61 8 1 194655 Santorini
7.67105 7.58506 62 8 1 30549 Pandemic
7.80825 7.57058 65 8 1 521 Crokinole
7.64090 7.50156 78 8 1 123260 Suburbia
7.59141 7.48291 84 8 1 14996 Ticket to Ride: Europe
7.66681 7.41349 102 8 1 31627 Ticket to Ride: Nordic Countries
7.67240 7.41118 104 8 1 188 Go
7.53092 7.39187 107 8 1 10630 Memoir ’44
7.48190 7.38916 109 8 1 9209 Ticket to Ride
# agecat == 2 when age %in% range(Q1,Q3)
game %>% 
  select(avg_rating, geek_rating, rank, age, agecat, game_id, names) %>%
  filter(agecat == 2) %>% 
  arrange(desc(geek_rating)) %>% 
  head(10) %>% kable
avg_rating geek_rating rank age agecat game_id names
8.29627 8.15458 4 12 2 120677 Terra Mystica
9.08970 8.15151 5 12 2 174430 Gloomhaven
8.37791 8.06267 8 12 2 167791 Terraforming Mars
8.17949 8.00663 10 12 2 102794 Caverna: The Cave Farmers
8.11355 7.99721 11 12 2 84876 The Castles of Burgundy
8.08780 7.98030 12 12 2 3076 Puerto Rico
8.05431 7.96041 14 12 2 31260 Agricola
8.08381 7.92190 17 12 2 25613 Through the Ages: A Story of Civilization
8.29052 7.86813 19 12 2 193738 Great Western Trail
7.94284 7.86145 21 12 2 2651 Power Grid
# agecat == 3 when age %in% range(Q3,)
game %>% 
  select(avg_rating, geek_rating, rank, age, agecat, game_id, names) %>%
  filter(agecat == 3) %>% 
  arrange(desc(geek_rating)) %>% 
  head(10) %>% kable
avg_rating geek_rating rank age agecat game_id names
8.66905 8.48904 1 13 3 161936 Pandemic Legacy: Season 1
8.72977 8.30744 2 14 3 182028 Through the Ages: A New Story of Civilization
8.35745 8.22021 3 13 3 12333 Twilight Struggle
8.53049 8.15037 6 14 3 187645 Star Wars: Rebellion
8.32419 8.08622 7 14 3 169786 Scythe
8.18761 8.02304 9 10 3 173346 7 Wonders Duel
8.38607 7.96376 13 13 3 115746 War of the Ring (Second Edition)
8.13872 7.93931 15 14 3 96848 Mage Knight Board Game
8.14718 7.92347 16 14 3 170216 Blood Rage
8.19903 7.89284 18 14 3 164153 Star Wars: Imperial Assault

For each age group, get top 100 ranked games, and then find the top 10 rated (by freq) mechanics for each age group.

# agecat == 1 when age %in% range(0,Q1)
top100_1 <- game %>% 
  filter(agecat == 1) %>% 
  arrange(rank) %>%
  mutate(new_rank = 1:n()) %>%
  filter(new_rank <= 100) 
mechanic <- top100_1[,27:76]
m1 <- mechanic %>% colSums() %>% sort(decreasing = T) %>% head(10)  
# agecat == 2 when age %in% range(Q1,Q3)
top100_2 <- game %>% 
  filter(agecat == 2) %>% 
  arrange(rank) %>%
  mutate(new_rank = 1:n()) %>%
  filter(new_rank <= 100) 
mechanic <- top100_2[,27:76]
m2 <- mechanic %>% colSums() %>% sort(decreasing = T) %>% head(10)  
# agecat == 3 when age %in% range(Q3,)
top100_3 <- game %>% 
  filter(agecat == 3) %>% 
  arrange(rank) %>%
  mutate(new_rank = 1:n()) %>%
  filter(new_rank <= 100) 
mechanic <- top100_3[,27:76]
m3 <- mechanic %>% colSums() %>% sort(decreasing = T) %>% head(10)  

We plotted the top 10 ranked game mechanics for each age group. We realized that the three age groups have similar sets of preferred board game categories, as well as mechanics.

Figure 3.1: Top 10 Ranked Game Mechanics for each Age Group

m11 <- data.frame(names=names(m1), m1)
m22 <- data.frame(names=names(m2), m2)
m33 <- data.frame(names=names(m3), m3)
p1 <- m11 %>% mutate(freq = m1) %>%
  ggplot () +
  geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ffcccc") +
  theme_light() +
  theme(axis.text=element_text(size=8)) +
  xlab('Age 0-8') +
  ylab('Frequency') +
  ylim(0,50)+
  coord_flip() 
p2 <- m22 %>% mutate(freq = m2) %>%
  ggplot () +
  geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ff9999") +
  theme_light() +
  theme(axis.text=element_text(size=8)) +
  xlab('Age 9-12') +
  ylab('Frequency') +
  ylim(0,50)+
  coord_flip()
p3 <-  m33 %>% mutate(freq = m3) %>%
  ggplot () +
  geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ff4d4d") +
  theme_light() +
  theme(axis.text=element_text(size=8)) +
  xlab('Age 13-21') +
  ylab('Frequency') +
  ylim(0,50)+
  coord_flip()
grid.newpage()
grid.draw(rbind(ggplotGrob(p1), ggplotGrob(p2),  ggplotGrob(p3),size = "last"))

Figure 3.2: Top 10 Ranked Game Categories for each Age Group

c1 <- top100_1[,78:160] %>% colSums() %>% sort(decreasing = T) %>% head(10)  
c2 <- top100_2[,78:160] %>% colSums() %>% sort(decreasing = T) %>% head(10)  
c3 <- top100_3[,78:160] %>% colSums() %>% sort(decreasing = T) %>% head(10)  

c11 <- data.frame(names=names(c1), c1)
c22 <- data.frame(names=c(names(c2)[1:7], 'manufacturing', names(c2)[9:10]), c2)
c33 <- data.frame(names=names(c3), c3)


pc1 <- c11 %>% mutate(freq = c1) %>%
  ggplot () +
  geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ffcccc") +
  theme(axis.text=element_text(size=8)) +
  xlab('Age 0-8') +
  ylab('Frequency') +
  ylim(0,40)+
  coord_flip() 
 

pc2 <- c22 %>% mutate(freq = c2) %>%
  ggplot () +
  geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ff9999") +
  theme(axis.text=element_text(size=8)) +
  xlab('Age 9-12') +
  ylab('Frequency') +
  ylim(0,40)+
  coord_flip()

pc3 <-  c33 %>% mutate(freq = c3) %>%
  ggplot () +
  geom_bar(aes(x = reorder(names, freq), y= freq), stat = "identity", fill= "#ff4d4d") +
  theme(axis.text=element_text(size=8)) +
  xlab('Age 13-21') +
  ylab('Frequency') +
  ylim(0,40)+
  coord_flip()

grid.newpage()
grid.draw(rbind(ggplotGrob(pc1), ggplotGrob(pc2),  ggplotGrob(pc3),size = "last"))

4.1.5 Explore the Optimal Game Difficulty

Figure 5: Game Difficulty and Rating

# average rating and difficulty
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2")
df_recode_diff <- df_recode_final
df_recode_diff$min_time_category <- df_recode_final$cate_mintime
df_recode_diff %>%
  ggplot() +
  geom_point(aes(x = weight, y = avg_rating, col = factor(min_time_category))) +
  xlab("Game Difficulty Level") +
  ylab("Average Rating") +
  ggtitle("Relationship of Game Difficulty and Rating") + 
  scale_color_manual("Game Length", values=cbPalette[1:3])
## Warning: Removed 3 rows containing missing values (geom_point).

We noticed that game rating increases as the difficulty level goes up, and that more difficult games need more time to play as expected.

4.1.6 Explore Beta Coefficients from Linear Regression Model

Firstly, we calculated the beta Coefficients of Game Categories (linear model with categories variable only) and visualized them. From preliminary univariate linear regression of categories on average rating, error bars represent the 95% confidence interval of each coefficient estimate. We found that several game categories, such as Environmental, Medical, Farming, Civilization, are positively associated with average rating.

Secondly, we calculated the beta Coefficients of Game Mechanics (linear model with mechanics variable only) and visualized them. From a similar univariate linear regression of mechanics on average rating, such as Worker Placement, Grid Movement, Variable Phase Order, Role Playing, we see that several types of mechanics are positively associated with average rating.

Figure 6.1: Beta Coefficients of Game Categories (linear model with categories variable only)

model1 <- lm(geek_rating ~ categories,data = tidygame)
coeff <- as.data.frame(model1$coefficients)
coeff$category <- as.factor(rownames(coeff))
colnames(coeff) <- c("coef","covariates")
confint_cat <- as.data.frame(confint(model1)) 

coeff_cat <- coeff[,]
coeff_cat$covariates <- gsub("categories", "", coeff_cat$covariates)
head(coeff_cat)
##                                      coef           covariates
## (Intercept)                     6.0280847          (Intercept)
## categoriesAction / Dexterity   -0.0845758   Action / Dexterity
## categoriesAdventure             0.3181357            Adventure
## categoriesAge of Reason         0.1628377        Age of Reason
## categoriesAmerican Civil War   -0.1357028   American Civil War
## categoriesAmerican Indian Wars  0.3046220 American Indian Wars
coeff_cat_error <- bind_cols(coeff_cat, confint_cat) 
colnames(coeff_cat_error) <- c("coef", "covariates", "lower", "upper")

ggplot(coeff_cat_error[-1,], aes( x=reorder(covariates, coef)))+
  geom_errorbar(aes(x =reorder(covariates, coef), ymin = lower, ymax = upper), color = "grey70") +
  geom_point(aes(y = coef), col = "#C1275C") +
  coord_flip() +
  labs(title = "Linear Model Coefficients for Category",y="Coefficients",x="Category")+
  scale_fill_gradient2(low = "light grey", mid = "grey70",
  high = "#C1275C", midpoint = 0.25) +
  theme_light() +
  guides(fill=guide_legend(title="Coefficient values")) +
  theme(axis.text=element_text(size=6))

** Figure 6.2: Beta Coefficients of Game Mechanics (linear model with mechanics variable only) **

model2 <- lm(geek_rating ~ mechanics, data = tidygame)
coeff <- as.data.frame(model2$coefficients)
coeff$category <- as.factor(rownames(coeff))
colnames(coeff) <- c("coef","covariates")
confint_mech <- as.data.frame(confint(model2)) 

coeff_mech <- coeff[,]
coeff_mech$covariates <- gsub("mechanics", "", coeff_mech$covariates)
coeff_mech_error <- bind_cols(coeff_mech, confint_mech) 
colnames(coeff_mech_error) <- c("coef", "covariates", "lower", "upper")

ggplot(coeff_mech_error[-1,], aes( x=reorder(covariates, coef)))+
  geom_errorbar(aes(x =reorder(covariates, coef), ymin = lower, ymax = upper), color = "grey70") +
  geom_point(aes(y = coef), col = "#C1275C") +
  coord_flip() +
  labs(title = "Linear Model Coefficients for Mechanics",y="Coefficients",x="Mechanics")+
  scale_fill_gradient2(low = "light grey", mid = "grey70",
  high = "#C1275C", midpoint = 0.25) +
  theme_light() +
  guides(fill=guide_legend(title="Coefficient values")) +
  theme(axis.text=element_text(size=8))

4.2 Board Game Evolution

4.2.2 Change in Game Mechanics and Themes Over the Years

Finally, we plotted the change in game mechanics and themes over the years 1980-2018.

For the mechanics, we found that hand management became one of the dominating mechanics in the past few decades, and dice rolling is a long-standing popular mechanic throughout the time.

For the themes(categories), we noticed that war games were really popular in the 80’s, but they became much less popular now; card games gradually gained a lot of popularity; a lot of fantasy games emerged in the last 10 years.

Figure 2.1: Change in game mechanics

# Change in game mechanics
# filter year to be on or after 1980
df_new1 <- game %>% filter(year >= 1980)

# group years into groups of 5-year intervals
df_new1$year_group <- cut(df_new1$year, breaks = c(1980, 1985, 1990, 1995, 2000, 2005, 2010, 2015, 2018), include.lowest = TRUE)

# get percentage of games with each top mechanic
dice_rolling <- df_new1 %>%
    group_by(year_group) %>%
    summarize(percent = sum(dice_rolling, na.rm = TRUE) / n()) %>%
    mutate(mechanic = "dice_rolling")
hand_management <- df_new1 %>%
    group_by(year_group) %>%
    summarise(percent = sum(hand_management, na.rm = TRUE) / n()) %>%
    mutate(mechanic = "hand_management")
variable_player_powers <- df_new1 %>%
    group_by(year_group) %>%
    summarise(percent = sum(variable_player_powers, na.rm = TRUE) / n()) %>%
    mutate(mechanic = "variable_player_powers")
set_collection <- df_new1 %>%
    group_by(year_group) %>%
    summarise(percent = sum(set_collection, na.rm = TRUE) / n()) %>%
    mutate(mechanic = "set_collection")
area_control_._area_influence <- df_new1 %>%
    group_by(year_group) %>%
    summarise(percent = sum(area_control_._area_influence, na.rm = TRUE) / n()) %>%
    mutate(mechanic = "area_control_._area_influence")
card_drafting <- df_new1 %>%
    group_by(year_group) %>%
    summarise(percent = sum(card_drafting, na.rm = TRUE) / n()) %>%
    mutate(mechanic = "card_drafting")

# plot
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2")
ggplot() + 
  geom_point(aes(x = dice_rolling$year_group, y = dice_rolling$percent, col = 'Dice Rolling')) + 
  geom_line(aes(x = dice_rolling$year_group, y = dice_rolling$percent, group = 1, col = 'Dice Rolling')) +
  geom_point(aes(x = hand_management$year_group, y = hand_management$percent, col = 'Hand Management')) + 
  geom_line(aes(x = hand_management$year_group, y = hand_management$percent, group = 1, col = 'Hand Management')) + 
  geom_point(aes(x = variable_player_powers$year_group, y = variable_player_powers$percent, col = 'Variable Player Powers')) +
  geom_line(aes(x = variable_player_powers$year_group, y = variable_player_powers$percent, group = 1, col = 'Variable Player Powers')) + 
  geom_point(aes(x = set_collection$year_group, y = set_collection$percent, col = 'Set Collection')) + 
  geom_line(aes(x = set_collection$year_group, y = set_collection$percent, group = 1, col = 'Set Collection')) +
  geom_point(aes(x = area_control_._area_influence$year_group, y = area_control_._area_influence$percent, col = 'Area Control/Area Influence')) + 
  geom_line(aes(x = `area_control_._area_influence`$year_group, y = area_control_._area_influence$percent, group = 1, col = 'Area Control/Area Influence')) +
  geom_point(aes(x = card_drafting$year_group, y = card_drafting$percent, col = 'Card Drafting')) + 
  geom_line(aes(x = card_drafting$year_group, y = card_drafting$percent, group = 1, col = 'Card Drafting')) +
  scale_colour_manual("", 
                      breaks = c("Dice Rolling", "Hand Management", "Variable Player Powers", "Set Collection", "Area Control/Area Influence", "Card Drafting"),
                      values = cbPalette[1:6]) +
  scale_x_discrete(breaks = dice_rolling$year_group, 
                   labels = seq(1980, 2015, 5)) +
  xlab("Year") +
  ylab("Percentage of games") +
  ggtitle("Evolution of Game Mechanics 1980 - 2018") +
  theme(legend.position="bottom")

Figure 2.2: Change in Game Categories

# change in game categories
# get percentage of games with top category
card_game <- df_new1 %>%
    group_by(year_group) %>%
    summarize(percent = sum(card_game, na.rm = TRUE) / n()) %>%
    mutate(mechanic = "card_game")
wargame <- df_new1 %>%
    group_by(year_group) %>%
    summarise(percent = sum(wargame, na.rm = TRUE) / n()) %>%
    mutate(mechanic = "wargame")
fantasy <- df_new1 %>%
    group_by(year_group) %>%
    summarise(percent = sum(fantasy, na.rm = TRUE) / n()) %>%
    mutate(mechanic = "fantasy")
economic <- df_new1 %>%
    group_by(year_group) %>%
    summarise(percent = sum(economic, na.rm = TRUE) / n()) %>%
    mutate(mechanic = "economic")
fighting <- df_new1 %>%
    group_by(year_group) %>%
    summarise(percent = sum(fighting, na.rm = TRUE) / n()) %>%
    mutate(mechanic = "fighting")
science_fiction <- df_new1 %>%
    group_by(year_group) %>%
    summarise(percent = sum(science_fiction, na.rm = TRUE) / n()) %>%
    mutate(mechanic = "science_fiction")

# plot
cbPalette <- c("#999999", "#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2")
ggplot() + 
  geom_point(aes(x = card_game$year_group, y = card_game$percent, col = 'Card Game')) + 
  geom_line(aes(x = card_game$year_group, y = card_game$percent, group = 1, col = 'Card Game')) +
  geom_point(aes(x = wargame$year_group, y = wargame$percent, col = 'War Game')) + 
  geom_line(aes(x = wargame$year_group, y = wargame$percent, group = 1, col = 'War Game')) + 
  geom_point(aes(x = fantasy$year_group, y = fantasy$percent, col = 'Fantasy')) +
  geom_line(aes(x = fantasy$year_group, y = fantasy$percent, group = 1, col = 'Fantasy')) + 
  geom_point(aes(x = economic$year_group, y = economic$percent, col = 'Economic')) + 
  geom_line(aes(x = economic$year_group, y = economic$percent, group = 1, col = 'Economic')) +
  geom_point(aes(x = fighting$year_group, y = fighting$percent, col = 'Fighting')) + 
  geom_line(aes(x = fighting$year_group, y = fighting$percent, group = 1, col = 'Fighting')) +
  geom_point(aes(x = science_fiction$year_group, y = science_fiction$percent, col = 'Science Fiction')) + 
  geom_line(aes(x = science_fiction$year_group, y = science_fiction$percent, group = 1, col = 'Science Fiction')) +
  scale_colour_manual("", 
                      breaks = c("Card Game", "War Game", "Fantasy", "Economic", "Fighting", "Science Fiction"),
                      values = cbPalette[1:6]) +
  scale_x_discrete(breaks = dice_rolling$year_group, 
                   labels = seq(1980, 2015, 5)) +
  xlab("Year") +
  ylab("Percentage of games") +
  ggtitle("Evolution of Game Categories 1980 - 2018") + 
  theme(legend.position="bottom")


Section 5. Predictions with Machine Learning

5.1 Objective

We want to predict the success of a board game which is measured by its average rating on boardgamegeek.com.

5.2 Methods

We tried four different machine learning methods, including linear regression, kth nearest neighbors, random forest, and support vector machine.

Building train and test set

# import data
set.seed(1)
game <- read.csv("df_recode_final_1127",  header =T, sep = "|")

# drop the irrelevant columns like game ID, names, designer
# drop na
g1 <- game[, -c(2:4, 14, 16, 18:19, 77)] #4750  152
g <- drop_na(g1) # 4666  152

# Spliting data as training and test set. Using createDataPartition() function from caret
inTrain <- createDataPartition(y = g$avg_rating,
                               p=0.8)$Resample
train_set <- slice(g, inTrain)
test_set <- slice(g, -inTrain)

control <- trainControl(method = 'cv', number = 20)

5.2.1 Linear Regression

# Finding the best covariates to test with ML lm method
model <- train(avg_rating ~ .,
             data = train_set,
             method = "lm",
             na.action=na.exclude,
             trControl = control,
             metric = "RMSE")
summary(model)
## 
## Call:
## lm(formula = .outcome ~ ., data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.27256 -0.24178 -0.03963  0.20882  1.96684 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    2.705e+00  2.518e-01  10.742  < 2e-16 ***
## rank                           6.808e-06  9.737e-06   0.699 0.484474    
## min_players                   -6.909e-03  1.611e-02  -0.429 0.668119    
## max_players                    3.831e-04  3.639e-04   1.053 0.292549    
## avg_time                       5.154e-03  9.482e-04   5.436 5.80e-08 ***
## min_time                      -1.405e-04  4.138e-05  -3.396 0.000691 ***
## max_time                      -4.995e-03  9.474e-04  -5.272 1.42e-07 ***
## year                           1.213e-05  4.854e-05   0.250 0.802641    
## geek_rating                    6.094e-01  3.502e-02  17.403  < 2e-16 ***
## num_votes                      8.335e-06  9.630e-06   0.866 0.386792    
## age                           -8.700e-03  2.174e-03  -4.002 6.41e-05 ***
## owned                         -2.012e-05  7.182e-06  -2.801 0.005122 ** 
## weight                         2.533e-01  2.194e-02  11.544  < 2e-16 ***
## single_player                  9.347e-02  3.751e-02   2.491 0.012765 *  
## multi_player                  -1.773e-02  3.734e-02  -0.475 0.634888    
## party_player                  -9.563e-02  3.476e-02  -2.751 0.005965 ** 
## cate_mintime                  -1.592e-01  1.964e-02  -8.109 6.85e-16 ***
## cate_avgtime                   1.082e-01  2.107e-02   5.136 2.95e-07 ***
## cate_weight                    5.263e-02  2.139e-02   2.461 0.013905 *  
## action_point_allowance_system  5.804e-02  2.487e-02   2.333 0.019678 *  
## co.operative_play              1.376e-01  3.014e-02   4.566 5.13e-06 ***
## hand_management               -1.272e-03  1.650e-02  -0.077 0.938561    
## point_to_point_movement        2.455e-03  2.915e-02   0.084 0.932898    
## set_collection                -1.373e-02  1.928e-02  -0.712 0.476529    
## trading                       -4.876e-02  4.075e-02  -1.197 0.231534    
## variable_player_powers         2.353e-02  2.038e-02   1.155 0.248347    
## auction.bidding               -7.090e-02  2.504e-02  -2.831 0.004661 ** 
## card_drafting                  4.793e-03  2.073e-02   0.231 0.817175    
## area_control_._area_influence -2.340e-02  2.132e-02  -1.098 0.272469    
## campaign_._battle_card_driven  8.760e-02  3.535e-02   2.478 0.013243 *  
## dice_rolling                  -2.046e-02  1.814e-02  -1.128 0.259401    
## simultaneous_action_selection -2.242e-02  2.525e-02  -0.888 0.374627    
## route.network_building         4.323e-02  3.625e-02   1.193 0.233127    
## variable_phase_order           7.600e-02  3.916e-02   1.941 0.052367 .  
## action_._movement_programming -7.939e-03  4.185e-02  -0.190 0.849565    
## grid_movement                  3.267e-02  2.859e-02   1.143 0.253225    
## modular_board                 -5.434e-02  2.216e-02  -2.452 0.014242 *  
## storytelling                   2.242e-01  5.618e-02   3.991 6.71e-05 ***
## area_movement                 -4.554e-02  2.727e-02  -1.670 0.094975 .  
## tile_placement                -1.502e-02  2.259e-02  -0.665 0.506075    
## worker_placement               3.595e-02  2.934e-02   1.225 0.220522    
## deck_._pool_building           2.139e-01  3.070e-02   6.969 3.74e-12 ***
## role_playing                  -2.092e-02  4.174e-02  -0.501 0.616198    
## partnerships                   5.404e-03  2.847e-02   0.190 0.849489    
## pick.up_and_deliver           -1.036e-01  3.820e-02  -2.711 0.006737 ** 
## player_elimination             1.286e-04  3.648e-02   0.004 0.997188    
## secret_unit_deployment        -5.937e-02  3.627e-02  -1.637 0.101753    
## pattern_recognition           -1.134e-01  5.122e-02  -2.214 0.026898 *  
## press_your_luck                4.000e-02  3.447e-02   1.161 0.245896    
## time_track                    -9.216e-03  7.360e-02  -0.125 0.900351    
## voting                        -3.797e-02  4.833e-02  -0.786 0.432125    
## area.impulse                   1.639e-01  8.581e-02   1.910 0.056214 .  
## hex.and.counter                8.005e-02  3.210e-02   2.494 0.012669 *  
## area_enclosure                -1.205e-01  5.138e-02  -2.345 0.019068 *  
## pattern_building              -1.990e-02  3.974e-02  -0.501 0.616663    
## take_that                      3.401e-03  3.676e-02   0.093 0.926297    
## stock_holding                  1.673e-01  4.989e-02   3.355 0.000803 ***
## commodity_speculation         -1.241e-01  4.833e-02  -2.569 0.010243 *  
## simulation                     6.762e-02  2.867e-02   2.358 0.018408 *  
## betting.wagering               4.237e-02  4.783e-02   0.886 0.375721    
## trick.taking                  -2.629e-02  4.899e-02  -0.537 0.591588    
## line_drawing                  -7.568e-03  9.270e-02  -0.082 0.934936    
## rock.paper.scissors           -1.719e-01  6.937e-02  -2.478 0.013264 *  
## roll_._spin_and_move          -7.899e-02  4.396e-02  -1.797 0.072412 .  
## paper.and.pencil              -6.102e-02  6.226e-02  -0.980 0.327121    
## acting                         1.801e-01  7.277e-02   2.475 0.013369 *  
## singing                       -2.967e-01  1.818e-01  -1.632 0.102865    
## chit.pull_system               1.295e-01  5.422e-02   2.388 0.016992 *  
## crayon_rail_system             9.567e-02  1.131e-01   0.846 0.397771    
## environmental                  3.035e-02  7.312e-02   0.415 0.678117    
## medical                        1.900e-01  9.794e-02   1.940 0.052419 .  
## card_game                     -1.801e-03  1.879e-02  -0.096 0.923681    
## civilization                  -1.029e-02  3.769e-02  -0.273 0.784892    
## economic                      -6.635e-03  2.495e-02  -0.266 0.790260    
## modern_warfare                 4.369e-02  5.417e-02   0.807 0.419905    
## political                     -1.013e-01  3.699e-02  -2.738 0.006214 ** 
## wargame                        7.744e-02  3.000e-02   2.582 0.009872 ** 
## fantasy                        1.601e-02  2.158e-02   0.742 0.458174    
## territory_building             2.351e-02  3.387e-02   0.694 0.487658    
## adventure                     -3.558e-02  3.316e-02  -1.073 0.283360    
## exploration                   -3.738e-02  3.185e-02  -1.174 0.240543    
## fighting                       1.313e-02  2.407e-02   0.546 0.585407    
## miniatures                     1.836e-01  3.184e-02   5.768 8.69e-09 ***
## dice                           1.824e-02  2.905e-02   0.628 0.530134    
## movies_._tv_._radio_theme     -1.576e-02  3.838e-02  -0.411 0.681389    
## science_fiction               -1.169e-02  2.652e-02  -0.441 0.659543    
## industry_._manufacturing      -3.819e-02  4.535e-02  -0.842 0.399805    
## ancient                       -5.606e-02  2.968e-02  -1.889 0.058991 .  
## city_building                 -6.181e-02  3.102e-02  -1.993 0.046374 *  
## animals                        6.399e-03  3.078e-02   0.208 0.835300    
## farming                       -2.167e-02  5.415e-02  -0.400 0.689101    
## medieval                      -6.132e-02  2.594e-02  -2.364 0.018139 *  
## novel.based                   -1.247e-01  4.221e-02  -2.954 0.003152 ** 
## mythology                      4.640e-02  4.222e-02   1.099 0.271929    
## american_west                 -5.054e-02  5.332e-02  -0.948 0.343266    
## horror                        -1.512e-02  3.901e-02  -0.388 0.698290    
## murder.mystery                -4.433e-02  6.008e-02  -0.738 0.460619    
## puzzle                         6.101e-02  4.791e-02   1.273 0.202962    
## video_game_theme              -6.683e-03  5.628e-02  -0.119 0.905484    
## space_exploration             -1.423e-01  5.594e-02  -2.544 0.011007 *  
## collectible_components        -7.883e-02  4.701e-02  -1.677 0.093652 .  
## bluffing                       1.968e-02  2.846e-02   0.691 0.489326    
## transportation                 1.035e-02  4.506e-02   0.230 0.818337    
## religious                     -1.925e-02  7.223e-02  -0.266 0.789892    
## travel                        -9.957e-02  6.844e-02  -1.455 0.145782    
## nautical                      -7.166e-02  3.392e-02  -2.113 0.034691 *  
## deduction                      5.812e-02  3.611e-02   1.609 0.107597    
## party_game                     1.309e-01  3.531e-02   3.706 0.000214 ***
## spies.secret_agents            1.081e-01  5.738e-02   1.884 0.059596 .  
## word_game                      1.133e-02  5.979e-02   0.189 0.849718    
## mature_._adult                 1.205e-01  1.058e-01   1.139 0.254619    
## renaissance                   -3.733e-02  3.986e-02  -0.937 0.349042    
## zombies                       -3.100e-02  6.058e-02  -0.512 0.608934    
## negotiation                    7.657e-02  3.852e-02   1.988 0.046882 *  
## abstract_strategy             -3.715e-02  3.100e-02  -1.199 0.230738    
## prehistoric                   -1.644e-01  6.706e-02  -2.451 0.014296 *  
## arabian                       -4.630e-02  7.438e-02  -0.622 0.533667    
## aviation_._flight             -7.081e-02  4.747e-02  -1.492 0.135899    
## post.napoleonic                1.059e-01  8.375e-02   1.264 0.206203    
## trains                         5.994e-02  4.935e-02   1.215 0.224583    
## action_._dexterity             1.596e-01  3.857e-02   4.138 3.57e-05 ***
## world_war_i                    1.256e-01  6.476e-02   1.939 0.052578 .  
## world_war_ii                   1.155e-01  3.509e-02   3.291 0.001008 ** 
## comic_book_._strip             6.079e-02  5.807e-02   1.047 0.295308    
## racing                        -1.739e-02  3.862e-02  -0.450 0.652449    
## real.time                     -7.946e-02  3.829e-02  -2.075 0.038029 *  
## humor                         -6.121e-02  3.416e-02  -1.792 0.073199 .  
## electronic                    -2.417e-02  8.020e-02  -0.301 0.763122    
## book                          -4.749e-02  1.128e-01  -0.421 0.673667    
## civil_war                      2.128e-01  8.190e-02   2.599 0.009394 ** 
## expansion_for_base.game        4.463e-01  2.208e-01   2.021 0.043306 *  
## sports                         1.278e-01  4.407e-02   2.900 0.003747 ** 
## pirates                       -1.288e-02  5.023e-02  -0.256 0.797627    
## age_of_reason                 -3.489e-03  6.563e-02  -0.053 0.957603    
## american_indian_wars           1.035e-01  1.262e-01   0.820 0.412384    
## american_revolutionary_war     1.552e-01  1.058e-01   1.467 0.142429    
## educational                    1.710e-01  5.793e-02   2.953 0.003171 ** 
## memory                        -9.770e-02  6.127e-02  -1.595 0.110902    
## maze                           4.133e-02  7.514e-02   0.550 0.582284    
## napoleonic                     1.364e-01  5.801e-02   2.352 0.018749 *  
## print_._play                   1.686e-01  4.668e-02   3.612 0.000307 ***
## american_civil_war             1.604e-01  5.633e-02   2.848 0.004428 ** 
## children.s_game               -4.896e-02  3.990e-02  -1.227 0.219924    
## vietnam_war                   -4.852e-02  1.247e-01  -0.389 0.697204    
## pike_and_shot                 -6.680e-03  1.148e-01  -0.058 0.953599    
## mafia                         -6.818e-02  7.214e-02  -0.945 0.344642    
## trivia                        -1.173e-03  7.071e-02  -0.017 0.986765    
## number                         5.263e-02  9.444e-02   0.557 0.577371    
## game_system                    2.085e-01  1.127e-01   1.851 0.064228 .  
## korean_war                     9.008e-02  2.006e-01   0.449 0.653487    
## music                          8.991e-02  1.595e-01   0.564 0.573067    
## math                          -9.626e-02  1.973e-01  -0.488 0.625635    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3773 on 3774 degrees of freedom
## Multiple R-squared:  0.5651, Adjusted R-squared:  0.5477 
## F-statistic: 32.47 on 151 and 3774 DF,  p-value: < 2.2e-16
#Removing NAs
tidygame <- train_set %>% select(avg_rating, year, weight , single_player , multi_player , hand_management , point_to_point_movement , set_collection , variable_player_powers , card_drafting , area_control_._area_influence , campaign_._battle_card_driven , dice_rolling , simultaneous_action_selection , route.network_building , variable_phase_order , grid_movement , storytelling , worker_placement , deck_._pool_building , player_elimination , press_your_luck , hex.and.counter , stock_holding , betting.wagering , line_drawing , rock.paper.scissors , environmental , card_game , economic , wargame , fighting , city_building , farming , murder.mystery) %>% na.omit(.)

#Stepwise selection
lm.null <- lm(avg_rating ~ 1, data = train_set)
lm.full <- lm(avg_rating ~ year + weight + single_player + multi_player + hand_management + point_to_point_movement + set_collection + variable_player_powers + card_drafting + area_control_._area_influence + campaign_._battle_card_driven + dice_rolling + simultaneous_action_selection + route.network_building + variable_phase_order + grid_movement + storytelling + worker_placement + deck_._pool_building + player_elimination + press_your_luck + hex.and.counter + stock_holding + betting.wagering + line_drawing + rock.paper.scissors + environmental + card_game + economic + wargame + fighting + city_building + farming + murder.mystery, data = train_set)
mod1 <- step(lm.null, direction = "both", scope = list(lower = lm.null, upper = lm.full))
## Start:  AIC=-4538.23
## avg_rating ~ 1
## 
##                                 Df Sum of Sq     RSS     AIC
## + weight                         1    356.71  878.41 -5874.2
## + wargame                        1     83.30 1151.82 -4810.4
## + single_player                  1     54.37 1180.75 -4713.0
## + hex.and.counter                1     53.00 1182.12 -4708.4
## + card_game                      1     32.16 1202.96 -4639.8
## + variable_player_powers         1     30.72 1204.40 -4635.1
## + deck_._pool_building           1     23.52 1211.60 -4611.7
## + dice_rolling                   1     22.84 1212.28 -4609.5
## + campaign_._battle_card_driven  1     21.58 1213.54 -4605.4
## + worker_placement               1     17.68 1217.44 -4592.8
## + economic                       1     17.40 1217.72 -4591.9
## + variable_phase_order           1     14.86 1220.26 -4583.8
## + grid_movement                  1     11.44 1223.68 -4572.8
## + set_collection                 1      9.27 1225.85 -4565.8
## + fighting                       1      8.24 1226.88 -4562.5
## + stock_holding                  1      5.76 1229.36 -4554.6
## + multi_player                   1      5.06 1230.06 -4552.4
## + point_to_point_movement        1      4.85 1230.27 -4551.7
## + route.network_building         1      3.90 1231.22 -4548.7
## + betting.wagering               1      3.44 1231.69 -4547.2
## + storytelling                   1      3.24 1231.89 -4546.5
## + card_drafting                  1      3.02 1232.10 -4545.9
## + area_control_._area_influence  1      2.59 1232.53 -4544.5
## + press_your_luck                1      2.10 1233.02 -4542.9
## + environmental                  1      1.90 1233.23 -4542.3
## + rock.paper.scissors            1      1.61 1233.51 -4541.4
## + farming                        1      0.75 1234.37 -4538.6
## <none>                                       1235.12 -4538.2
## + year                           1      0.60 1234.52 -4538.1
## + line_drawing                   1      0.51 1234.61 -4537.8
## + city_building                  1      0.26 1234.86 -4537.1
## + simultaneous_action_selection  1      0.19 1234.93 -4536.8
## + hand_management                1      0.16 1234.96 -4536.8
## + murder.mystery                 1      0.10 1235.03 -4536.5
## + player_elimination             1      0.01 1235.11 -4536.3
## 
## Step:  AIC=-5874.24
## avg_rating ~ weight
## 
##                                 Df Sum of Sq     RSS     AIC
## + single_player                  1     25.08  853.33 -5986.0
## + deck_._pool_building           1     16.39  862.03 -5946.2
## + variable_player_powers         1      8.61  869.80 -5910.9
## + storytelling                   1      7.23  871.19 -5904.7
## + campaign_._battle_card_driven  1      5.95  872.46 -5898.9
## + grid_movement                  1      5.79  872.62 -5898.2
## + dice_rolling                   1      3.39  875.02 -5887.4
## + card_drafting                  1      2.71  875.71 -5884.4
## + fighting                       1      2.28  876.13 -5882.5
## + player_elimination             1      1.95  876.46 -5881.0
## + hand_management                1      1.89  876.52 -5880.7
## + press_your_luck                1      1.82  876.60 -5880.4
## + wargame                        1      1.69  876.72 -5879.8
## + rock.paper.scissors            1      1.58  876.83 -5879.3
## + variable_phase_order           1      1.50  876.92 -5878.9
## + area_control_._area_influence  1      1.31  877.11 -5878.1
## + city_building                  1      0.88  877.53 -5876.2
## + economic                       1      0.65  877.76 -5875.2
## + worker_placement               1      0.52  877.89 -5874.6
## <none>                                        878.41 -5874.2
## + murder.mystery                 1      0.40  878.02 -5874.0
## + environmental                  1      0.36  878.05 -5873.9
## + point_to_point_movement        1      0.35  878.07 -5873.8
## + set_collection                 1      0.29  878.12 -5873.5
## + stock_holding                  1      0.29  878.13 -5873.5
## + route.network_building         1      0.23  878.18 -5873.3
## + year                           1      0.22  878.20 -5873.2
## + card_game                      1      0.16  878.26 -5872.9
## + multi_player                   1      0.13  878.29 -5872.8
## + farming                        1      0.08  878.33 -5872.6
## + line_drawing                   1      0.08  878.33 -5872.6
## + simultaneous_action_selection  1      0.05  878.37 -5872.4
## + hex.and.counter                1      0.01  878.41 -5872.3
## + betting.wagering               1      0.00  878.41 -5872.2
## - weight                         1    356.71 1235.12 -4538.2
## 
## Step:  AIC=-5985.98
## avg_rating ~ weight + single_player
## 
##                                 Df Sum of Sq     RSS     AIC
## + deck_._pool_building           1     13.33  840.00 -6045.8
## + variable_player_powers         1      6.67  846.66 -6014.8
## + multi_player                   1      6.54  846.79 -6014.2
## + grid_movement                  1      5.58  847.75 -6009.7
## + storytelling                   1      5.37  847.96 -6008.7
## + campaign_._battle_card_driven  1      4.75  848.58 -6005.9
## + card_drafting                  1      2.66  850.67 -5996.2
## + hand_management                1      2.38  850.95 -5995.0
## + fighting                       1      2.17  851.16 -5994.0
## + player_elimination             1      1.96  851.37 -5993.0
## + dice_rolling                   1      1.88  851.45 -5992.6
## + rock.paper.scissors            1      1.80  851.53 -5992.3
## + variable_phase_order           1      1.62  851.71 -5991.4
## + wargame                        1      1.34  851.99 -5990.2
## + press_your_luck                1      1.28  852.05 -5989.9
## + city_building                  1      0.86  852.47 -5987.9
## + murder.mystery                 1      0.44  852.89 -5986.0
## <none>                                        853.33 -5986.0
## + worker_placement               1      0.36  852.97 -5985.6
## + area_control_._area_influence  1      0.35  852.98 -5985.6
## + card_game                      1      0.22  853.11 -5985.0
## + economic                       1      0.22  853.12 -5985.0
## + point_to_point_movement        1      0.19  853.14 -5984.9
## + environmental                  1      0.16  853.17 -5984.7
## + simultaneous_action_selection  1      0.09  853.24 -5984.4
## + year                           1      0.09  853.24 -5984.4
## + betting.wagering               1      0.06  853.27 -5984.2
## + farming                        1      0.05  853.28 -5984.2
## + set_collection                 1      0.05  853.28 -5984.2
## + route.network_building         1      0.04  853.29 -5984.2
## + line_drawing                   1      0.03  853.30 -5984.1
## + stock_holding                  1      0.02  853.31 -5984.1
## + hex.and.counter                1      0.00  853.33 -5984.0
## - single_player                  1     25.08  878.41 -5874.2
## - weight                         1    327.42 1180.75 -4713.0
## 
## Step:  AIC=-6045.78
## avg_rating ~ weight + single_player + deck_._pool_building
## 
##                                 Df Sum of Sq     RSS     AIC
## + grid_movement                  1      5.93  834.07 -6071.6
## + storytelling                   1      5.76  834.24 -6070.8
## + variable_player_powers         1      5.66  834.34 -6070.3
## + multi_player                   1      5.59  834.42 -6070.0
## + campaign_._battle_card_driven  1      5.40  834.61 -6069.1
## + wargame                        1      2.65  837.35 -6056.2
## + dice_rolling                   1      2.33  837.67 -6054.7
## + rock.paper.scissors            1      2.10  837.91 -6053.6
## + variable_phase_order           1      1.84  838.16 -6052.4
## + player_elimination             1      1.84  838.17 -6052.4
## + fighting                       1      1.45  838.55 -6050.6
## + press_your_luck                1      1.33  838.67 -6050.0
## + hand_management                1      0.94  839.07 -6048.2
## + card_drafting                  1      0.92  839.08 -6048.1
## + city_building                  1      0.75  839.25 -6047.3
## + murder.mystery                 1      0.46  839.54 -6045.9
## <none>                                        840.00 -6045.8
## + area_control_._area_influence  1      0.33  839.68 -6045.3
## + worker_placement               1      0.31  839.69 -6045.2
## + point_to_point_movement        1      0.23  839.78 -6044.8
## + environmental                  1      0.20  839.80 -6044.7
## + simultaneous_action_selection  1      0.19  839.82 -6044.7
## + hex.and.counter                1      0.13  839.87 -6044.4
## + farming                        1      0.12  839.89 -6044.3
## + economic                       1      0.11  839.89 -6044.3
## + card_game                      1      0.11  839.89 -6044.3
## + betting.wagering               1      0.11  839.89 -6044.3
## + year                           1      0.04  839.96 -6044.0
## + line_drawing                   1      0.03  839.97 -6043.9
## + set_collection                 1      0.02  839.98 -6043.9
## + route.network_building         1      0.01  839.99 -6043.8
## + stock_holding                  1      0.00  840.00 -6043.8
## - deck_._pool_building           1     13.33  853.33 -5986.0
## - single_player                  1     22.03  862.03 -5946.2
## - weight                         1    322.88 1162.88 -4770.9
## 
## Step:  AIC=-6071.59
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement
## 
##                                 Df Sum of Sq     RSS     AIC
## + storytelling                   1      5.90  828.17 -6097.5
## + campaign_._battle_card_driven  1      5.63  828.44 -6096.2
## + multi_player                   1      4.67  829.40 -6091.6
## + variable_player_powers         1      3.96  830.12 -6088.3
## + wargame                        1      3.49  830.59 -6086.0
## + rock.paper.scissors            1      2.19  831.89 -6079.9
## + dice_rolling                   1      1.84  832.23 -6078.3
## + variable_phase_order           1      1.83  832.24 -6078.2
## + player_elimination             1      1.54  832.53 -6076.9
## + press_your_luck                1      1.29  832.79 -6075.6
## + card_drafting                  1      1.17  832.91 -6075.1
## + hand_management                1      1.05  833.02 -6074.5
## + fighting                       1      0.72  833.35 -6073.0
## + city_building                  1      0.65  833.43 -6072.6
## + murder.mystery                 1      0.49  833.58 -6071.9
## + worker_placement               1      0.48  833.59 -6071.9
## <none>                                        834.07 -6071.6
## + hex.and.counter                1      0.35  833.72 -6071.2
## + point_to_point_movement        1      0.33  833.74 -6071.2
## + area_control_._area_influence  1      0.28  833.79 -6070.9
## + simultaneous_action_selection  1      0.22  833.85 -6070.6
## + environmental                  1      0.15  833.92 -6070.3
## + betting.wagering               1      0.13  833.94 -6070.2
## + farming                        1      0.11  833.96 -6070.1
## + year                           1      0.07  834.00 -6069.9
## + economic                       1      0.04  834.04 -6069.8
## + line_drawing                   1      0.03  834.04 -6069.7
## + route.network_building         1      0.00  834.07 -6069.6
## + card_game                      1      0.00  834.07 -6069.6
## + set_collection                 1      0.00  834.07 -6069.6
## + stock_holding                  1      0.00  834.07 -6069.6
## - grid_movement                  1      5.93  840.00 -6045.8
## - deck_._pool_building           1     13.68  847.75 -6009.7
## - single_player                  1     21.78  855.85 -5972.4
## - weight                         1    317.60 1151.68 -4806.9
## 
## Step:  AIC=-6097.47
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling
## 
##                                 Df Sum of Sq     RSS     AIC
## + campaign_._battle_card_driven  1      5.64  822.53 -6122.3
## + multi_player                   1      5.28  822.89 -6120.6
## + wargame                        1      3.97  824.20 -6114.4
## + variable_player_powers         1      3.64  824.53 -6112.8
## + rock.paper.scissors            1      2.10  826.07 -6105.5
## + variable_phase_order           1      1.92  826.25 -6104.6
## + dice_rolling                   1      1.82  826.35 -6104.1
## + player_elimination             1      1.54  826.63 -6102.8
## + press_your_luck                1      1.48  826.69 -6102.5
## + hand_management                1      1.22  826.95 -6101.2
## + card_drafting                  1      1.21  826.97 -6101.2
## + fighting                       1      0.75  827.42 -6099.0
## + city_building                  1      0.56  827.61 -6098.1
## + hex.and.counter                1      0.44  827.73 -6097.6
## + worker_placement               1      0.43  827.74 -6097.5
## <none>                                        828.17 -6097.5
## + point_to_point_movement        1      0.29  827.88 -6096.8
## + simultaneous_action_selection  1      0.21  827.96 -6096.5
## + area_control_._area_influence  1      0.21  827.96 -6096.5
## + environmental                  1      0.18  827.99 -6096.3
## + betting.wagering               1      0.17  828.00 -6096.3
## + murder.mystery                 1      0.14  828.03 -6096.2
## + farming                        1      0.14  828.04 -6096.1
## + year                           1      0.06  828.11 -6095.8
## + line_drawing                   1      0.04  828.13 -6095.7
## + economic                       1      0.02  828.15 -6095.6
## + card_game                      1      0.01  828.16 -6095.5
## + set_collection                 1      0.00  828.17 -6095.5
## + stock_holding                  1      0.00  828.17 -6095.5
## + route.network_building         1      0.00  828.17 -6095.5
## - storytelling                   1      5.90  834.07 -6071.6
## - grid_movement                  1      6.07  834.24 -6070.8
## - deck_._pool_building           1     14.09  842.26 -6033.2
## - single_player                  1     19.94  848.11 -6006.1
## - weight                         1    321.41 1149.58 -4812.0
## 
## Step:  AIC=-6122.32
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling + campaign_._battle_card_driven
## 
##                                 Df Sum of Sq     RSS     AIC
## + multi_player                   1     4.781  817.75 -6143.2
## + variable_player_powers         1     3.495  819.03 -6137.0
## + rock.paper.scissors            1     2.295  820.23 -6131.3
## + wargame                        1     2.146  820.38 -6130.6
## + variable_phase_order           1     1.716  820.81 -6128.5
## + press_your_luck                1     1.584  820.94 -6127.9
## + player_elimination             1     1.439  821.09 -6127.2
## + card_drafting                  1     1.402  821.12 -6127.0
## + dice_rolling                   1     1.349  821.18 -6126.8
## + hand_management                1     0.993  821.53 -6125.1
## + fighting                       1     0.901  821.63 -6124.6
## + worker_placement               1     0.683  821.84 -6123.6
## + hex.and.counter                1     0.669  821.86 -6123.5
## <none>                                        822.53 -6122.3
## + city_building                  1     0.400  822.13 -6122.2
## + area_control_._area_influence  1     0.304  822.22 -6121.8
## + environmental                  1     0.229  822.30 -6121.4
## + simultaneous_action_selection  1     0.222  822.30 -6121.4
## + betting.wagering               1     0.202  822.33 -6121.3
## + farming                        1     0.192  822.33 -6121.2
## + murder.mystery                 1     0.156  822.37 -6121.1
## + line_drawing                   1     0.052  822.48 -6120.6
## + year                           1     0.045  822.48 -6120.5
## + set_collection                 1     0.039  822.49 -6120.5
## + stock_holding                  1     0.025  822.50 -6120.4
## + route.network_building         1     0.010  822.52 -6120.4
## + card_game                      1     0.010  822.52 -6120.4
## + point_to_point_movement        1     0.005  822.52 -6120.3
## + economic                       1     0.001  822.53 -6120.3
## - campaign_._battle_card_driven  1     5.644  828.17 -6097.5
## - storytelling                   1     5.917  828.44 -6096.2
## - grid_movement                  1     6.306  828.83 -6094.3
## - deck_._pool_building           1    14.776  837.30 -6054.4
## - single_player                  1    18.715  841.24 -6036.0
## - weight                         1   307.932 1130.46 -4875.9
## 
## Step:  AIC=-6143.21
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling + campaign_._battle_card_driven + 
##     multi_player
## 
##                                 Df Sum of Sq     RSS     AIC
## + variable_player_powers         1     4.137  813.61 -6161.1
## + rock.paper.scissors            1     2.476  815.27 -6153.1
## + variable_phase_order           1     1.931  815.81 -6150.5
## + player_elimination             1     1.842  815.90 -6150.1
## + press_your_luck                1     1.787  815.96 -6149.8
## + card_drafting                  1     1.498  816.25 -6148.4
## + wargame                        1     1.304  816.44 -6147.5
## + dice_rolling                   1     1.082  816.66 -6146.4
## + fighting                       1     0.973  816.77 -6145.9
## + hand_management                1     0.875  816.87 -6145.4
## + worker_placement               1     0.742  817.00 -6144.8
## + city_building                  1     0.426  817.32 -6143.3
## <none>                                        817.75 -6143.2
## + simultaneous_action_selection  1     0.366  817.38 -6143.0
## + betting.wagering               1     0.348  817.40 -6142.9
## + area_control_._area_influence  1     0.338  817.41 -6142.8
## + environmental                  1     0.266  817.48 -6142.5
## + murder.mystery                 1     0.257  817.49 -6142.4
## + hex.and.counter                1     0.248  817.50 -6142.4
## + farming                        1     0.218  817.53 -6142.3
## + stock_holding                  1     0.205  817.54 -6142.2
## + line_drawing                   1     0.118  817.63 -6141.8
## + economic                       1     0.110  817.64 -6141.7
## + year                           1     0.069  817.68 -6141.5
## + route.network_building         1     0.067  817.68 -6141.5
## + set_collection                 1     0.036  817.71 -6141.4
## + point_to_point_movement        1     0.002  817.74 -6141.2
## + card_game                      1     0.000  817.75 -6141.2
## - multi_player                   1     4.781  822.53 -6122.3
## - campaign_._battle_card_driven  1     5.141  822.89 -6120.6
## - grid_movement                  1     5.346  823.09 -6119.6
## - storytelling                   1     6.497  824.24 -6114.1
## - deck_._pool_building           1    13.796  831.54 -6079.5
## - single_player                  1    23.422  841.17 -6034.3
## - weight                         1   290.097 1107.84 -4953.2
## 
## Step:  AIC=-6161.12
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling + campaign_._battle_card_driven + 
##     multi_player + variable_player_powers
## 
##                                 Df Sum of Sq     RSS     AIC
## + rock.paper.scissors            1     2.875  810.73 -6173.0
## + wargame                        1     1.731  811.88 -6167.5
## + press_your_luck                1     1.687  811.92 -6167.3
## + variable_phase_order           1     1.560  812.05 -6166.7
## + card_drafting                  1     1.503  812.11 -6166.4
## + player_elimination             1     1.282  812.33 -6165.3
## + worker_placement               1     0.825  812.78 -6163.1
## + hex.and.counter                1     0.618  812.99 -6162.1
## + hand_management                1     0.559  813.05 -6161.8
## + dice_rolling                   1     0.494  813.12 -6161.5
## <none>                                        813.61 -6161.1
## + area_control_._area_influence  1     0.410  813.20 -6161.1
## + stock_holding                  1     0.375  813.23 -6160.9
## + city_building                  1     0.359  813.25 -6160.9
## + betting.wagering               1     0.341  813.27 -6160.8
## + farming                        1     0.315  813.29 -6160.6
## + simultaneous_action_selection  1     0.299  813.31 -6160.6
## + economic                       1     0.259  813.35 -6160.4
## + environmental                  1     0.239  813.37 -6160.3
## + murder.mystery                 1     0.169  813.44 -6159.9
## + line_drawing                   1     0.162  813.45 -6159.9
## + route.network_building         1     0.151  813.46 -6159.8
## + fighting                       1     0.096  813.51 -6159.6
## + set_collection                 1     0.079  813.53 -6159.5
## + year                           1     0.035  813.57 -6159.3
## + card_game                      1     0.017  813.59 -6159.2
## + point_to_point_movement        1     0.005  813.60 -6159.1
## - grid_movement                  1     3.651  817.26 -6145.5
## - variable_player_powers         1     4.137  817.75 -6143.2
## - campaign_._battle_card_driven  1     4.959  818.57 -6139.3
## - multi_player                   1     5.423  819.03 -6137.0
## - storytelling                   1     6.184  819.79 -6133.4
## - deck_._pool_building           1    12.749  826.36 -6102.1
## - single_player                  1    22.854  836.46 -6054.4
## - weight                         1   276.320 1089.93 -5015.2
## 
## Step:  AIC=-6173.01
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling + campaign_._battle_card_driven + 
##     multi_player + variable_player_powers + rock.paper.scissors
## 
##                                 Df Sum of Sq     RSS     AIC
## + press_your_luck                1     1.672  809.06 -6179.1
## + wargame                        1     1.604  809.13 -6178.8
## + variable_phase_order           1     1.556  809.18 -6178.6
## + card_drafting                  1     1.388  809.35 -6177.7
## + player_elimination             1     1.318  809.42 -6177.4
## + worker_placement               1     0.768  809.97 -6174.7
## + simultaneous_action_selection  1     0.617  810.12 -6174.0
## + hand_management                1     0.608  810.13 -6174.0
## + hex.and.counter                1     0.581  810.15 -6173.8
## + area_control_._area_influence  1     0.449  810.29 -6173.2
## + dice_rolling                   1     0.419  810.32 -6173.0
## <none>                                        810.73 -6173.0
## + stock_holding                  1     0.411  810.32 -6173.0
## + city_building                  1     0.366  810.37 -6172.8
## + betting.wagering               1     0.323  810.41 -6172.6
## + farming                        1     0.303  810.43 -6172.5
## + environmental                  1     0.226  810.51 -6172.1
## + economic                       1     0.225  810.51 -6172.1
## + line_drawing                   1     0.157  810.58 -6171.8
## + murder.mystery                 1     0.155  810.58 -6171.8
## + fighting                       1     0.149  810.59 -6171.7
## + route.network_building         1     0.136  810.60 -6171.7
## + set_collection                 1     0.065  810.67 -6171.3
## + year                           1     0.036  810.70 -6171.2
## + card_game                      1     0.016  810.72 -6171.1
## + point_to_point_movement        1     0.004  810.73 -6171.0
## - rock.paper.scissors            1     2.875  813.61 -6161.1
## - grid_movement                  1     3.651  814.38 -6157.4
## - variable_player_powers         1     4.536  815.27 -6153.1
## - campaign_._battle_card_driven  1     5.142  815.88 -6150.2
## - multi_player                   1     5.663  816.40 -6147.7
## - storytelling                   1     6.085  816.82 -6145.7
## - deck_._pool_building           1    13.035  823.77 -6112.4
## - single_player                  1    23.197  833.93 -6064.3
## - weight                         1   275.123 1085.86 -5027.9
## 
## Step:  AIC=-6179.12
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling + campaign_._battle_card_driven + 
##     multi_player + variable_player_powers + rock.paper.scissors + 
##     press_your_luck
## 
##                                 Df Sum of Sq     RSS     AIC
## + wargame                        1     1.686  807.38 -6185.3
## + variable_phase_order           1     1.547  807.52 -6184.6
## + card_drafting                  1     1.411  807.65 -6184.0
## + player_elimination             1     1.174  807.89 -6182.8
## + worker_placement               1     0.712  808.35 -6180.6
## + hand_management                1     0.699  808.36 -6180.5
## + simultaneous_action_selection  1     0.652  808.41 -6180.3
## + hex.and.counter                1     0.593  808.47 -6180.0
## + stock_holding                  1     0.425  808.64 -6179.2
## <none>                                        809.06 -6179.1
## + area_control_._area_influence  1     0.406  808.66 -6179.1
## + city_building                  1     0.336  808.73 -6178.8
## + farming                        1     0.316  808.75 -6178.7
## + betting.wagering               1     0.255  808.81 -6178.4
## + dice_rolling                   1     0.245  808.82 -6178.3
## + economic                       1     0.233  808.83 -6178.2
## + environmental                  1     0.206  808.86 -6178.1
## + murder.mystery                 1     0.186  808.88 -6178.0
## + line_drawing                   1     0.183  808.88 -6178.0
## + fighting                       1     0.159  808.90 -6177.9
## + route.network_building         1     0.155  808.91 -6177.9
## + set_collection                 1     0.035  809.03 -6177.3
## + year                           1     0.029  809.03 -6177.3
## + card_game                      1     0.007  809.06 -6177.2
## + point_to_point_movement        1     0.005  809.06 -6177.1
## - press_your_luck                1     1.672  810.73 -6173.0
## - rock.paper.scissors            1     2.860  811.92 -6167.3
## - grid_movement                  1     3.616  812.68 -6163.6
## - variable_player_powers         1     4.431  813.49 -6159.7
## - campaign_._battle_card_driven  1     5.234  814.30 -6155.8
## - multi_player                   1     5.867  814.93 -6152.8
## - storytelling                   1     6.313  815.37 -6150.6
## - deck_._pool_building           1    13.096  822.16 -6118.1
## - single_player                  1    22.824  831.89 -6071.9
## - weight                         1   275.526 1084.59 -5030.5
## 
## Step:  AIC=-6185.31
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling + campaign_._battle_card_driven + 
##     multi_player + variable_player_powers + rock.paper.scissors + 
##     press_your_luck + wargame
## 
##                                 Df Sum of Sq     RSS     AIC
## + card_drafting                  1     1.948  805.43 -6192.8
## + variable_phase_order           1     1.574  805.80 -6191.0
## + worker_placement               1     1.326  806.05 -6189.8
## + hand_management                1     1.118  806.26 -6188.8
## + player_elimination             1     1.041  806.33 -6188.4
## + economic                       1     0.798  806.58 -6187.2
## + stock_holding                  1     0.718  806.66 -6186.8
## + simultaneous_action_selection  1     0.588  806.79 -6186.2
## + farming                        1     0.453  806.92 -6185.5
## <none>                                        807.38 -6185.3
## + route.network_building         1     0.407  806.97 -6185.3
## + betting.wagering               1     0.282  807.09 -6184.7
## + environmental                  1     0.275  807.10 -6184.6
## + fighting                       1     0.245  807.13 -6184.5
## + area_control_._area_influence  1     0.229  807.15 -6184.4
## + murder.mystery                 1     0.216  807.16 -6184.4
## + line_drawing                   1     0.191  807.19 -6184.2
## + set_collection                 1     0.164  807.21 -6184.1
## + city_building                  1     0.141  807.24 -6184.0
## + dice_rolling                   1     0.034  807.34 -6183.5
## + year                           1     0.026  807.35 -6183.4
## + point_to_point_movement        1     0.005  807.37 -6183.3
## + card_game                      1     0.003  807.37 -6183.3
## + hex.and.counter                1     0.000  807.38 -6183.3
## - wargame                        1     1.686  809.06 -6179.1
## - press_your_luck                1     1.755  809.13 -6178.8
## - rock.paper.scissors            1     2.729  810.10 -6174.1
## - campaign_._battle_card_driven  1     3.654  811.03 -6169.6
## - grid_movement                  1     4.047  811.42 -6167.7
## - multi_player                   1     4.853  812.23 -6163.8
## - variable_player_powers         1     4.854  812.23 -6163.8
## - storytelling                   1     6.585  813.96 -6155.4
## - deck_._pool_building           1    14.097  821.47 -6119.4
## - single_player                  1    21.461  828.84 -6084.3
## - weight                         1   217.458 1024.83 -5251.0
## 
## Step:  AIC=-6192.8
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling + campaign_._battle_card_driven + 
##     multi_player + variable_player_powers + rock.paper.scissors + 
##     press_your_luck + wargame + card_drafting
## 
##                                 Df Sum of Sq     RSS     AIC
## + variable_phase_order           1     1.503  803.92 -6198.1
## + worker_placement               1     1.194  804.23 -6196.6
## + player_elimination             1     1.019  804.41 -6195.8
## + stock_holding                  1     0.797  804.63 -6194.7
## + economic                       1     0.762  804.67 -6194.5
## + hand_management                1     0.760  804.67 -6194.5
## + simultaneous_action_selection  1     0.556  804.87 -6193.5
## + route.network_building         1     0.421  805.01 -6192.8
## <none>                                        805.43 -6192.8
## + farming                        1     0.399  805.03 -6192.7
## + betting.wagering               1     0.344  805.08 -6192.5
## + fighting                       1     0.308  805.12 -6192.3
## + area_control_._area_influence  1     0.290  805.14 -6192.2
## + city_building                  1     0.242  805.19 -6192.0
## + murder.mystery                 1     0.233  805.19 -6191.9
## + line_drawing                   1     0.211  805.22 -6191.8
## + environmental                  1     0.158  805.27 -6191.6
## + dice_rolling                   1     0.061  805.37 -6191.1
## + card_game                      1     0.041  805.39 -6191.0
## + set_collection                 1     0.022  805.41 -6190.9
## + year                           1     0.013  805.41 -6190.9
## + hex.and.counter                1     0.006  805.42 -6190.8
## + point_to_point_movement        1     0.005  805.42 -6190.8
## - press_your_luck                1     1.797  807.22 -6186.0
## - card_drafting                  1     1.948  807.38 -6185.3
## - wargame                        1     2.223  807.65 -6184.0
## - rock.paper.scissors            1     2.575  808.00 -6182.3
## - campaign_._battle_card_driven  1     3.643  809.07 -6177.1
## - grid_movement                  1     4.379  809.81 -6173.5
## - multi_player                   1     4.824  810.25 -6171.4
## - variable_player_powers         1     4.923  810.35 -6170.9
## - storytelling                   1     6.693  812.12 -6162.3
## - deck_._pool_building           1    11.998  817.43 -6136.7
## - single_player                  1    21.476  826.90 -6091.5
## - weight                         1   213.946 1019.37 -5269.9
## 
## Step:  AIC=-6198.13
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling + campaign_._battle_card_driven + 
##     multi_player + variable_player_powers + rock.paper.scissors + 
##     press_your_luck + wargame + card_drafting + variable_phase_order
## 
##                                 Df Sum of Sq     RSS     AIC
## + worker_placement               1     0.994  802.93 -6201.0
## + player_elimination             1     0.986  802.94 -6200.9
## + stock_holding                  1     0.810  803.11 -6200.1
## + hand_management                1     0.746  803.18 -6199.8
## + economic                       1     0.628  803.30 -6199.2
## + simultaneous_action_selection  1     0.503  803.42 -6198.6
## + route.network_building         1     0.428  803.50 -6198.2
## <none>                                        803.92 -6198.1
## + betting.wagering               1     0.361  803.56 -6197.9
## + fighting                       1     0.354  803.57 -6197.9
## + area_control_._area_influence  1     0.346  803.58 -6197.8
## + city_building                  1     0.343  803.58 -6197.8
## + farming                        1     0.333  803.59 -6197.8
## + murder.mystery                 1     0.252  803.67 -6197.4
## + line_drawing                   1     0.215  803.71 -6197.2
## + environmental                  1     0.182  803.74 -6197.0
## + dice_rolling                   1     0.071  803.85 -6196.5
## + card_game                      1     0.042  803.88 -6196.3
## + hex.and.counter                1     0.021  803.90 -6196.2
## + set_collection                 1     0.011  803.91 -6196.2
## + year                           1     0.010  803.91 -6196.2
## + point_to_point_movement        1     0.001  803.92 -6196.1
## - variable_phase_order           1     1.503  805.43 -6192.8
## - press_your_luck                1     1.787  805.71 -6191.4
## - card_drafting                  1     1.877  805.80 -6191.0
## - wargame                        1     2.242  806.17 -6189.2
## - rock.paper.scissors            1     2.573  806.50 -6187.6
## - campaign_._battle_card_driven  1     3.483  807.41 -6183.2
## - grid_movement                  1     4.413  808.34 -6178.6
## - variable_player_powers         1     4.527  808.45 -6178.1
## - multi_player                   1     4.979  808.90 -6175.9
## - storytelling                   1     6.803  810.73 -6167.0
## - deck_._pool_building           1    12.235  816.16 -6140.8
## - single_player                  1    21.730  825.65 -6095.4
## - weight                         1   206.632 1010.56 -5302.1
## 
## Step:  AIC=-6200.99
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling + campaign_._battle_card_driven + 
##     multi_player + variable_player_powers + rock.paper.scissors + 
##     press_your_luck + wargame + card_drafting + variable_phase_order + 
##     worker_placement
## 
##                                 Df Sum of Sq    RSS     AIC
## + player_elimination             1     1.001 801.93 -6203.9
## + stock_holding                  1     0.963 801.97 -6203.7
## + hand_management                1     0.839 802.09 -6203.1
## + route.network_building         1     0.557 802.37 -6201.7
## + simultaneous_action_selection  1     0.510 802.42 -6201.5
## + economic                       1     0.474 802.46 -6201.3
## + city_building                  1     0.425 802.51 -6201.1
## + fighting                       1     0.415 802.52 -6201.0
## <none>                                       802.93 -6201.0
## + area_control_._area_influence  1     0.395 802.54 -6200.9
## + betting.wagering               1     0.365 802.57 -6200.8
## + murder.mystery                 1     0.280 802.65 -6200.4
## + farming                        1     0.236 802.69 -6200.1
## + line_drawing                   1     0.224 802.71 -6200.1
## + environmental                  1     0.166 802.76 -6199.8
## + dice_rolling                   1     0.073 802.86 -6199.3
## + hex.and.counter                1     0.047 802.88 -6199.2
## + card_game                      1     0.017 802.91 -6199.1
## + year                           1     0.006 802.92 -6199.0
## + set_collection                 1     0.001 802.93 -6199.0
## + point_to_point_movement        1     0.001 802.93 -6199.0
## - worker_placement               1     0.994 803.92 -6198.1
## - variable_phase_order           1     1.303 804.23 -6196.6
## - press_your_luck                1     1.731 804.66 -6194.5
## - card_drafting                  1     1.763 804.69 -6194.4
## - rock.paper.scissors            1     2.494 805.42 -6190.8
## - wargame                        1     2.797 805.73 -6189.3
## - campaign_._battle_card_driven  1     3.546 806.48 -6185.7
## - grid_movement                  1     4.685 807.62 -6180.1
## - variable_player_powers         1     4.714 807.64 -6180.0
## - multi_player                   1     4.894 807.82 -6179.1
## - storytelling                   1     6.764 809.69 -6170.1
## - deck_._pool_building           1    12.382 815.31 -6142.9
## - single_player                  1    21.299 824.23 -6100.2
## - weight                         1   186.807 989.74 -5381.8
## 
## Step:  AIC=-6203.88
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling + campaign_._battle_card_driven + 
##     multi_player + variable_player_powers + rock.paper.scissors + 
##     press_your_luck + wargame + card_drafting + variable_phase_order + 
##     worker_placement + player_elimination
## 
##                                 Df Sum of Sq    RSS     AIC
## + stock_holding                  1     0.962 800.97 -6206.6
## + hand_management                1     0.764 801.17 -6205.6
## + route.network_building         1     0.546 801.38 -6204.6
## + economic                       1     0.474 801.46 -6204.2
## <none>                                       801.93 -6203.9
## + city_building                  1     0.405 801.52 -6203.9
## + simultaneous_action_selection  1     0.405 801.52 -6203.9
## + area_control_._area_influence  1     0.396 801.53 -6203.8
## + betting.wagering               1     0.358 801.57 -6203.6
## + fighting                       1     0.302 801.63 -6203.4
## + murder.mystery                 1     0.272 801.66 -6203.2
## + line_drawing                   1     0.239 801.69 -6203.1
## + farming                        1     0.234 801.70 -6203.0
## + environmental                  1     0.178 801.75 -6202.8
## + dice_rolling                   1     0.090 801.84 -6202.3
## + hex.and.counter                1     0.057 801.87 -6202.2
## + card_game                      1     0.038 801.89 -6202.1
## + set_collection                 1     0.005 801.92 -6201.9
## + year                           1     0.005 801.92 -6201.9
## + point_to_point_movement        1     0.000 801.93 -6201.9
## - player_elimination             1     1.001 802.93 -6201.0
## - worker_placement               1     1.009 802.94 -6200.9
## - variable_phase_order           1     1.271 803.20 -6199.7
## - press_your_luck                1     1.592 803.52 -6198.1
## - card_drafting                  1     1.742 803.67 -6197.4
## - rock.paper.scissors            1     2.528 804.46 -6193.5
## - wargame                        1     2.633 804.56 -6193.0
## - campaign_._battle_card_driven  1     3.520 805.45 -6188.7
## - variable_player_powers         1     4.169 806.10 -6185.5
## - grid_movement                  1     4.523 806.45 -6183.8
## - multi_player                   1     5.174 807.10 -6180.6
## - storytelling                   1     6.780 808.71 -6172.8
## - deck_._pool_building           1    12.272 814.20 -6146.3
## - single_player                  1    21.635 823.56 -6101.4
## - weight                         1   187.800 989.73 -5379.8
## 
## Step:  AIC=-6206.59
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling + campaign_._battle_card_driven + 
##     multi_player + variable_player_powers + rock.paper.scissors + 
##     press_your_luck + wargame + card_drafting + variable_phase_order + 
##     worker_placement + player_elimination + stock_holding
## 
##                                 Df Sum of Sq    RSS     AIC
## + hand_management                1     0.866 800.10 -6208.8
## + simultaneous_action_selection  1     0.442 800.53 -6206.8
## <none>                                       800.97 -6206.6
## + fighting                       1     0.360 800.61 -6206.4
## + betting.wagering               1     0.354 800.61 -6206.3
## + city_building                  1     0.352 800.62 -6206.3
## + area_control_._area_influence  1     0.347 800.62 -6206.3
## + murder.mystery                 1     0.294 800.67 -6206.0
## + farming                        1     0.253 800.71 -6205.8
## + line_drawing                   1     0.251 800.72 -6205.8
## + route.network_building         1     0.222 800.75 -6205.7
## + environmental                  1     0.198 800.77 -6205.6
## + economic                       1     0.197 800.77 -6205.6
## + dice_rolling                   1     0.113 800.85 -6205.1
## + hex.and.counter                1     0.084 800.88 -6205.0
## + card_game                      1     0.026 800.94 -6204.7
## + set_collection                 1     0.008 800.96 -6204.6
## + year                           1     0.003 800.97 -6204.6
## + point_to_point_movement        1     0.000 800.97 -6204.6
## - stock_holding                  1     0.962 801.93 -6203.9
## - player_elimination             1     1.000 801.97 -6203.7
## - worker_placement               1     1.162 802.13 -6202.9
## - variable_phase_order           1     1.271 802.24 -6202.4
## - press_your_luck                1     1.618 802.59 -6200.7
## - card_drafting                  1     1.816 802.78 -6199.7
## - rock.paper.scissors            1     2.559 803.53 -6196.1
## - wargame                        1     3.092 804.06 -6193.5
## - campaign_._battle_card_driven  1     3.545 804.51 -6191.3
## - variable_player_powers         1     4.513 805.48 -6186.5
## - grid_movement                  1     4.617 805.59 -6186.0
## - multi_player                   1     5.634 806.60 -6181.1
## - storytelling                   1     6.872 807.84 -6175.1
## - deck_._pool_building           1    12.496 813.46 -6147.8
## - single_player                  1    22.343 823.31 -6100.6
## - weight                         1   170.137 971.11 -5452.4
## 
## Step:  AIC=-6208.84
## avg_rating ~ weight + single_player + deck_._pool_building + 
##     grid_movement + storytelling + campaign_._battle_card_driven + 
##     multi_player + variable_player_powers + rock.paper.scissors + 
##     press_your_luck + wargame + card_drafting + variable_phase_order + 
##     worker_placement + player_elimination + stock_holding + hand_management
## 
##                                 Df Sum of Sq    RSS     AIC
## <none>                                       800.10 -6208.8
## + area_control_._area_influence  1     0.378 799.72 -6208.7
## + simultaneous_action_selection  1     0.377 799.72 -6208.7
## + betting.wagering               1     0.350 799.75 -6208.6
## + fighting                       1     0.347 799.76 -6208.5
## + city_building                  1     0.342 799.76 -6208.5
## + murder.mystery                 1     0.298 799.80 -6208.3
## + line_drawing                   1     0.285 799.82 -6208.2
## + card_game                      1     0.276 799.83 -6208.2
## + route.network_building         1     0.243 799.86 -6208.0
## + farming                        1     0.222 799.88 -6207.9
## + economic                       1     0.215 799.89 -6207.9
## + environmental                  1     0.184 799.92 -6207.7
## + dice_rolling                   1     0.170 799.93 -6207.7
## + hex.and.counter                1     0.125 799.98 -6207.5
## + set_collection                 1     0.002 800.10 -6206.9
## + year                           1     0.001 800.10 -6206.8
## + point_to_point_movement        1     0.000 800.10 -6206.8
## - hand_management                1     0.866 800.97 -6206.6
## - player_elimination             1     0.920 801.02 -6206.3
## - stock_holding                  1     1.063 801.17 -6205.6
## - variable_phase_order           1     1.249 801.35 -6204.7
## - worker_placement               1     1.272 801.37 -6204.6
## - card_drafting                  1     1.451 801.55 -6203.7
## - press_your_luck                1     1.730 801.83 -6202.4
## - rock.paper.scissors            1     2.612 802.71 -6198.0
## - campaign_._battle_card_driven  1     3.256 803.36 -6194.9
## - wargame                        1     3.543 803.64 -6193.5
## - variable_player_powers         1     4.218 804.32 -6190.2
## - grid_movement                  1     4.828 804.93 -6187.2
## - multi_player                   1     5.382 805.48 -6184.5
## - storytelling                   1     7.060 807.16 -6176.4
## - deck_._pool_building           1    11.656 811.76 -6154.1
## - single_player                  1    22.511 822.61 -6101.9
## - weight                         1   170.638 970.74 -5451.9
#finalized linear model
mod1 <- lm(avg_rating ~ weight + year + wargame + single_player +
    deck_._pool_building + storytelling + grid_movement + hex.and.counter +
    campaign_._battle_card_driven + multi_player + city_building +
    rock.paper.scissors + simultaneous_action_selection + area_control_._area_influence +
    line_drawing + betting.wagering + route.network_building +
    press_your_luck + murder.mystery + fighting + card_drafting +
    stock_holding + variable_player_powers + card_game + point_to_point_movement,
    data = train_set)
summary(mod1)
## 
## Call:
## lm(formula = avg_rating ~ weight + year + wargame + single_player + 
##     deck_._pool_building + storytelling + grid_movement + hex.and.counter + 
##     campaign_._battle_card_driven + multi_player + city_building + 
##     rock.paper.scissors + simultaneous_action_selection + area_control_._area_influence + 
##     line_drawing + betting.wagering + route.network_building + 
##     press_your_luck + murder.mystery + fighting + card_drafting + 
##     stock_holding + variable_player_powers + card_game + point_to_point_movement, 
##     data = train_set)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.45454 -0.30833 -0.01586  0.29416  1.74102 
## 
## Coefficients:
##                                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                    5.947e+00  1.174e-01  50.643  < 2e-16 ***
## weight                         3.302e-01  1.168e-02  28.267  < 2e-16 ***
## year                           2.129e-05  5.706e-05   0.373 0.709146    
## wargame                        7.426e-02  2.706e-02   2.744 0.006097 ** 
## single_player                  2.380e-01  2.285e-02  10.415  < 2e-16 ***
## deck_._pool_building           2.606e-01  3.437e-02   7.581 4.25e-14 ***
## storytelling                   3.459e-01  6.216e-02   5.564 2.81e-08 ***
## grid_movement                  1.393e-01  3.146e-02   4.428 9.77e-06 ***
## hex.and.counter                6.086e-03  3.368e-02   0.181 0.856636    
## campaign_._battle_card_driven  1.747e-01  4.106e-02   4.254 2.15e-05 ***
## multi_player                   8.879e-02  1.661e-02   5.346 9.51e-08 ***
## city_building                 -2.900e-02  3.532e-02  -0.821 0.411666    
## rock.paper.scissors           -3.107e-01  8.173e-02  -3.802 0.000146 ***
## simultaneous_action_selection  5.108e-02  2.778e-02   1.838 0.066090 .  
## area_control_._area_influence -2.345e-02  2.341e-02  -1.002 0.316491    
## line_drawing                   1.154e-01  1.078e-01   1.071 0.284283    
## betting.wagering               7.809e-02  5.507e-02   1.418 0.156286    
## route.network_building         3.626e-02  3.664e-02   0.989 0.322534    
## press_your_luck                1.164e-01  3.935e-02   2.958 0.003110 ** 
## murder.mystery                 7.515e-02  6.378e-02   1.178 0.238788    
## fighting                       3.617e-02  2.637e-02   1.371 0.170311    
## card_drafting                  7.951e-02  2.371e-02   3.353 0.000806 ***
## stock_holding                  8.481e-02  5.239e-02   1.619 0.105596    
## variable_player_powers         9.357e-02  2.186e-02   4.280 1.92e-05 ***
## card_game                     -6.678e-03  1.835e-02  -0.364 0.715951    
## point_to_point_movement       -7.111e-03  3.225e-02  -0.221 0.825465    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4535 on 3900 degrees of freedom
## Multiple R-squared:  0.3506, Adjusted R-squared:  0.3464 
## F-statistic: 84.22 on 25 and 3900 DF,  p-value: < 2.2e-16
#predictions vs actual rating
predictions <- predict(mod1, test_set)
cor(predictions, test_set$avg_rating)
## [1] 0.6160393
#graph of predictions vs actual rating
data.frame(test_set$avg_rating, predictions) %>% ggplot(aes(test_set$avg_rating, predictions)) +
  geom_point(color = "pink") +
  stat_ellipse(color = "pink") +
  xlab("Actual Average Rating") +
  ylab("Predicted Average Rating") +
  ggtitle("Actual vs Predicted Average Rating")

#graph of linear model coefficients by magnitude
mod_coef <- data.frame(mod1$coeff)
mod_coef <- mod_coef %>%
  mutate(variable = rownames(mod_coef))

ggplot(mod_coef[-1,], aes( x=reorder(variable, mod1.coeff), y=mod1.coeff, fill=mod1.coeff))+
  geom_bar(stat="identity") +
  coord_flip() +
  labs(title = "Final Linear Model Coefficients",y="Coefficients",x="Variable")+
  scale_fill_gradient(low="grey50", high="grey50")+
  theme_light() +
  guides(fill=guide_legend(title="Coefficient values"))

5.2.2 K-Nearest Neighbors

# train knn
knnFit <- train(avg_rating ~.,
             data = train_set,
             method = "knn",
             na.action=na.exclude,
             trControl = control,
             preProcess = c("center", "scale"),
             tuneLength = 10)
knnFit
## k-Nearest Neighbors 
## 
## 3926 samples
##  151 predictor
## 
## Pre-processing: centered (151), scaled (151) 
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 3730, 3729, 3730, 3730, 3730, 3729, ... 
## Resampling results across tuning parameters:
## 
##   k   RMSE       Rsquared   MAE      
##    5  0.4825633  0.2887958  0.3764107
##    7  0.4760223  0.2996555  0.3733424
##    9  0.4737251  0.3037911  0.3736822
##   11  0.4717253  0.3106384  0.3725916
##   13  0.4700448  0.3159743  0.3715325
##   15  0.4695079  0.3175796  0.3715943
##   17  0.4687577  0.3210255  0.3712425
##   19  0.4694049  0.3205389  0.3716843
##   21  0.4682217  0.3259626  0.3703395
##   23  0.4686133  0.3268853  0.3703274
## 
## RMSE was used to select the optimal model using  the smallest value.
## The final value used for the model was k = 21.
#plot it
knnFit %>% ggplot() +
  geom_point(color='pink') +
  geom_line(color ='pink')

#prediction
knnPredict <- predict(knnFit,newdata = test_set) # 932

#plot actual vs prediction
data <- data.frame(cbind(test_set$avg_rating,knnPredict))
names(data) <- c('avg_rating','knnPredict')

data %>% ggplot(aes(avg_rating, knnPredict)) +
  geom_point(color = 'pink') +
  xlim(c(6,8)) +
  ylim(c(6,8)) +
  ylab('KNN_prediction') +
  xlab('Actual average_rating')
## Warning: Removed 54 rows containing missing values (geom_point).

5.2.3 Random Forest

# select needed columns
train_set_select <- train_set[, c(2:8, 11, 13:152)]

# run randomForest with all features
fit <- randomForest(avg_rating ~  .,
      data = train_set_select,
      ntree = 500)

# plot feature importance
impt <- as.data.frame(importance(fit))
impt$variable <- names(train_set_select[, -8])
impt <- transform(impt, variable = reorder(variable, IncNodePurity))
impt %>%
  mutate(sort(IncNodePurity, decreasing = TRUE)) %>%
  ggplot() +
  geom_bar(aes(y = IncNodePurity, x = variable), stat = 'identity') +
  coord_flip() +
  ylab("Feature Importance") +
  xlab("Feature") +
  ggtitle("Board Game Features Ranked by Importance") +
  theme(text = element_text(size=2))

Top 20 Most Important Features

# select top 20 most important features
impt_feature <- impt %>%
  mutate(sort(IncNodePurity, decreasing = TRUE)) %>%
  head(20) %>%
  select(variable)
impt_feature <- as.character(as.vector(impt_feature$variable))

impt_feature_df <- impt %>%
  filter(variable %in% impt_feature)
impt_feature_df %>%
  mutate(sort(IncNodePurity, decreasing = TRUE)) %>%
  ggplot() +
  geom_bar(aes(y = IncNodePurity, x = variable), stat = 'identity') +
  coord_flip() +
  ylab("Feature Importance") +
  xlab("Feature") +
  ggtitle("Board Game Features Ranked by Importance")

# fit the random forest model
# change number of variables randomly sampled as candidates at each split
RMSE_mtry <- c()
for (m in 1:30) {
  fit <- randomForest(avg_rating ~  .,
      data = train_set_select,
      ntree = 100,
      mtry = m)

  # make predictions on test set
  predictions <- predict(fit, test_set)

  # calculate RMSE
  RMSE <- sqrt(sum((predictions - test_set$avg_rating)^2)/length(predictions))
  print(RMSE)
  RMSE_mtry <- c(RMSE_mtry, RMSE)
}
## [1] 0.4962323
## [1] 0.4414347
## [1] 0.4199963
## [1] 0.4072601
## [1] 0.3996708
## [1] 0.3900891
## [1] 0.3839525
## [1] 0.3820635
## [1] 0.3807499
## [1] 0.3791024
## [1] 0.3781362
## [1] 0.3766863
## [1] 0.3740838
## [1] 0.3728372
## [1] 0.3704151
## [1] 0.3693029
## [1] 0.3713737
## [1] 0.3705476
## [1] 0.3703653
## [1] 0.3715319
## [1] 0.3688442
## [1] 0.3695915
## [1] 0.3713514
## [1] 0.3704681
## [1] 0.3694058
## [1] 0.3683989
## [1] 0.3683301
## [1] 0.369796
## [1] 0.371918
## [1] 0.3686873
# get minimum RMSE
which.min(RMSE_mtry)
## [1] 27
# plot RMSE with mtry
ggplot() +
  geom_line(aes(x = 1:30, y = RMSE_mtry)) +
  geom_point(aes(x = 1:30, y = RMSE_mtry)) +
  ggtitle("Choose the Best Number of Variables to Include at Each Split") +
  xlab("mtry") +
  ylab("RMSE")

# change number of trees to grow
RMSE_ntree <- c()
for (n in 1:10) {
  fit <- randomForest(avg_rating ~  .,
      data = train_set_select,
      ntree = n * 50,
      mtry = which(RMSE_mtry == min(RMSE_mtry)))

  # make predictions on test set
  predictions <- predict(fit, test_set)

  # calculate RMSE
  RMSE <- sqrt(sum((predictions - test_set$avg_rating)^2)/length(predictions))
  print(RMSE)
  RMSE_ntree <- c(RMSE_ntree, RMSE)
}
## [1] 0.371073
## [1] 0.368581
## [1] 0.3701315
## [1] 0.3686847
## [1] 0.3681581
## [1] 0.3694296
## [1] 0.3682721
## [1] 0.3678488
## [1] 0.3671117
## [1] 0.3682236
ggplot() +
  geom_line(aes(x = seq(50, 500, 50), y = RMSE_ntree)) +
  geom_point(aes(x = seq(50, 500, 50), y = RMSE_ntree)) +
  ggtitle("Choose the Best Number of Trees to Grow") +
  xlab("ntree") +
  ylab("RMSE")

which.min(RMSE_ntree)
## [1] 9
min(RMSE_ntree)
## [1] 0.3671117

Fit a model using our best mtry and ntree.

# best model
fit <- randomForest(avg_rating ~  .,
      data = train_set_select,
      ntree = which.min(RMSE_ntree) * 50,
      mtry =  which.min(RMSE_mtry))

# make predictions
predictions <- predict(fit, test_set)

# calculate new RMSE
RMSE <- sqrt(sum((predictions - test_set$avg_rating)^2)/length(predictions))
print(RMSE)
## [1] 0.3683742
# R^2
R_2 <- 1 - sum((test_set$avg_rating - predictions)^2) / sum((test_set$avg_rating - mean(test_set$avg_rating))^2)
R_2
## [1] 0.5606712
# plot
ggplot() +
  geom_point(aes(x = test_set$avg_rating, y = predictions), col = "pink") +
  ggtitle("Predictions vs True Average Ratings") +
  xlab("True Average Rating") +
  ylab("Predicted Average Rating")

5.2.4 Support Vector Machines

### support vector machine
svmFit <- train(avg_rating ~.,
             data = train_set,
             method = "svmLinear",
             na.action=na.exclude,
             trControl = control,
             preProcess = c("center", "scale"),
             tuneLength = 10)
## 
## Attaching package: 'kernlab'
## The following object is masked from 'package:purrr':
## 
##     cross
## The following object is masked from 'package:ggplot2':
## 
##     alpha
svmFit
## Support Vector Machines with Linear Kernel 
## 
## 3926 samples
##  151 predictor
## 
## Pre-processing: centered (151), scaled (151) 
## Resampling: Cross-Validated (20 fold) 
## Summary of sample sizes: 3729, 3729, 3730, 3730, 3730, 3730, ... 
## Resampling results:
## 
##   RMSE       Rsquared   MAE      
##   0.3929209  0.5224474  0.2931945
## 
## Tuning parameter 'C' was held constant at a value of 1
svmPredict <- predict(svmFit,newdata = test_set)

#plot actual vs prediction
data <- data.frame(cbind(svmPredict, test_set$avg_rating))
names(data) <- c('svmPredict','avg_rating')
data %>% ggplot(aes(avg_rating, svmPredict)) +
  geom_point(color =  'hotpink2') +
  xlim(c(6,8)) +
  ylim(c(6,8)) +
  xlab('Actual average_rating') +
  ylab('SVM_prediction')
## Warning: Removed 67 rows containing missing values (geom_point).


Section 6. Board Game Recommender

6.1 Introduction to Board Game Recommender

We also built a simple board game recommender where a user can input their favorite board game and we will make recommendations of several board games that we think they might like, based on how similar the games are to their favorite game.

# import data
df <- read.csv("df_recode_final_1127", sep = "|")

# only keep data on or after 1980
df1 <- df %>% filter(year >= 1980)

# omit na
df1_na_omit <- na.omit(df1)

# drop columns not wanted
drops <- c("rank", "bgg_url", "game_id", "image_url", "mechanic", "category", "designer")
df_rec <- df1_na_omit[ , !(names(df1_na_omit) %in% drops)]
#Recommender function
# function to get similarity between two boardgames using Euclidean distance
get_most_simi <- function(game_name, df) {
  # get only the mechanics and category columns
  df_new <- df[, c(1, 20:153)]
  #df_new <- df
  if (game_name %in% df_new$names) {
    print("Yay your game is found!")
    # create a vector of the features of the user's favorite game
    game_played <- as.numeric(as.vector(df_new[df_new$names == game_name, ]))[-1]
    score <- numeric(0)
    for (i in 1:dim(df_new)[1]) {
      score <- c(score, 
                 dist(list(game_played, as.numeric(df_new[i, -1])), method = "Euclidean"))
    }
    names(score) <- df_new[, 1]
    
    games <- names(score)
    score <- as.data.frame(score)
    score$game <- games
    score <- score[order(score$score), ]
    similar_games <- score %>% 
      filter(score < quantile(score, 0.02) & 
               score != 0)
    game_list <- df %>%
      filter(names %in% similar_games$game)
    recommendations <- game_list[order(game_list$geek_rating, decreasing = TRUE), ] %>%
      select(names) %>%
      head(10)
    return(recommendations)
  }
  else 
    print("Loading should only take a few seconds! If no games appear, please try another :)")
}

# example
# input: Kingdom Builder
get_most_simi("Kingdom Builder", df_rec)
## [1] "Yay your game is found!"
##                      names
## 1              Carcassonne
## 2             Web of Power
## 3  Carcassonne: The Castle
## 4                  Domaine
## 5                Gold West
## 6                   Rattus
## 7             L<f6>wenherz
## 8                   Fjords
## 9                   Barony
## 10        Guilds of London

6.2 Board Game Recommender Shiny App

We built a board game recommender Shiny app where user can input the name of a board game and we can output a table of 10 board games recommended.

#shiny app
ui <- fluidPage(
  sidebarLayout(
    sidebarPanel(
  # add a title
  titlePanel("Board Game Recommender"),
  textInput("text", label = h3("Game"), value = "Name a game :)" )),
  mainPanel(
    titlePanel("Recommended Games"),
  tableOutput("table"))
)
)
server <- function(input, output) {
  output$value <- renderText({ input$text})
  output$table <- renderTable({
    recommendations <- get_most_simi(input$text, df_rec)
  })
}
shinyApp(ui=ui,server=server)
Shiny applications not supported in static R Markdown documents

Section 7. Final Analysis

The exploratory data analysis and machine learning were used to analyze the board game dataset. From the EDA, we explored the relationships between different characteristics of a board game to the average game rating, and we built models to predict the rating based on possible predictors. Here are some interesting findings. The average rating is different from the geek rating by both categories and mechanics across years. We found more similarity in preferred mechanics than in preferred categories for each age group and each player group. The top rated categories were card game, economics, flighting and fantasy. Card game and fantasy theme were gradually taking over the market, but the war game lost popularity over the years. The top rated mechanics were variable player powers, dice rolling, hand management, and card drafting. Hand management became more popular over the years. Also, the longer the players spent in a game, the more likely they would highly rate this game.

The four machine learning methods were used to formally assess the association between characteristics of the board games to the average rating. The linear regression based on the stepwise selection by AIC was applied. A final model had 24 significant predictors and RMSE = 0.42. By the regression model, game difficulty most strongly influenced the average rating of a game. Of all significant categories, storytelling was most strongly associated with higher average ratings. Of all significant mechanics, war games were most strongly associated with higher average ratings. Games categorized as rock-paper-scissors games were the most strongly associated with lower average ratings. The kth nearest neighbor using 20 fold cross validation was applied. The optimal k is 23 with RMSE = 0.46. The random forest with ntree = 550 and mtry = 30 was applied to select the top important features. Based on the random forest analysis, game difficulty, year, maximum and minimum time needed to play, age requirement, and player groups were the most important features. The support vector machine using 20 fold cross validation was applied. It resulted the lowest RMSE = 0.35, which means our best model were on average 0.35 points off from the true rating. There were several board game categories and mechanics that tend to do better than others. To maximize the chances of a board game becoming popular, a game designer could attempt making a game in these top categories or using some popular mechanics.

Based on our prediction, a board game recommender was built. Six relevant board games were recommended for a board game based on Euclidean distance.